This commit is contained in:
@@ -0,0 +1,750 @@
|
|||||||
|
Here’s a simple, practical way to score vulnerabilities that’s more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Why move beyond CVSS?
|
||||||
|
|
||||||
|
* **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime.
|
||||||
|
* **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.”
|
||||||
|
* **Teams need consistency**: the same image should get the same score across environments when inputs are identical.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### The scoring idea (plain English)
|
||||||
|
|
||||||
|
Score = a weighted function of:
|
||||||
|
|
||||||
|
1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call).
|
||||||
|
2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime).
|
||||||
|
3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match).
|
||||||
|
|
||||||
|
A compact, auditable formula you can start with:
|
||||||
|
|
||||||
|
```
|
||||||
|
NormalizedScore = W_R * f(R) + W_E * g(E) + W_P * h(P)
|
||||||
|
```
|
||||||
|
|
||||||
|
* Pick monotonic, bounded transforms (e.g., map to 0..1):
|
||||||
|
|
||||||
|
* f(R): inverse of hops (shorter path ⇒ higher value)
|
||||||
|
* g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data)
|
||||||
|
* h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible)
|
||||||
|
|
||||||
|
Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### What makes this “deterministic”?
|
||||||
|
|
||||||
|
* Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations.
|
||||||
|
* The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Minimal rubric (ready to implement)
|
||||||
|
|
||||||
|
* **Reachability (R, 0..1)**
|
||||||
|
|
||||||
|
* 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops)
|
||||||
|
* 0.66 = reachable but behind uncommon feature flag or deep path (4–7 hops)
|
||||||
|
* 0.33 = only theoretically reachable (code present, no discovered path)
|
||||||
|
* 0.00 = dead/unreferenced code in this build
|
||||||
|
* **Evidence (E, 0..1)** (sum, capped at 1.0)
|
||||||
|
|
||||||
|
* +0.6 runtime trace hitting the symbol
|
||||||
|
* +0.3 DAST/integ test activating vulnerable behavior
|
||||||
|
* +0.2 SAST precise sink match
|
||||||
|
* +0.1 SCA presence only (no call evidence)
|
||||||
|
* (Apply 10–30% decay if older than N days)
|
||||||
|
* **Provenance (P, 0..1)**
|
||||||
|
|
||||||
|
* 0.0 unsigned/unknown origin
|
||||||
|
* 0.3 signed image only
|
||||||
|
* 0.6 signed + SBOM (hash-linked)
|
||||||
|
* 1.0 signed + SBOM + DSSE attestations + reproducible build match
|
||||||
|
|
||||||
|
Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### How this plugs into **Stella Ops**
|
||||||
|
|
||||||
|
* **Scanner** produces call-graphs & symbol maps (R).
|
||||||
|
* **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E).
|
||||||
|
* **Authority/Proof‑Graph** verifies signatures, SBOM↔image hash links, DSSE/Rekor (P).
|
||||||
|
* **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes.
|
||||||
|
* **Replay**: any audit can re-run the same policy with the same inputs and get the same score.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Developer checklist (do this first)
|
||||||
|
|
||||||
|
* Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes.
|
||||||
|
* Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes.
|
||||||
|
* Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint).
|
||||||
|
* Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs’ hashes in every VEX note.
|
||||||
|
* Add a **30‑sec “Time‑to‑Evidence” UI**: click a score → see the exact call path, evidence list, and provenance checks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Why this helps compliance & sales
|
||||||
|
|
||||||
|
* Every number is **auditable** (inputs + function are transparent).
|
||||||
|
* Scores remain **consistent across air‑gapped sites** (deterministic, no hidden heuristics).
|
||||||
|
* You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves).
|
||||||
|
|
||||||
|
If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into Stella Ops today.
|
||||||
|
Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **Stella Ops** (Scanner → Evidence/Vexer → Authority/Proof‑Graph → Policy Engine → UI/VEX output). I’m assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1) Extend the scoring model into a stable, “auditable primitive”
|
||||||
|
|
||||||
|
### 1.1 Outputs you should standardize on
|
||||||
|
|
||||||
|
Produce **two** signed artifacts per finding (plus optional UI views):
|
||||||
|
|
||||||
|
1. **ScoreResult** (primary):
|
||||||
|
|
||||||
|
* `riskScore` (0–100 integer)
|
||||||
|
* `subscores` (each 0–100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance`
|
||||||
|
* `explain[]` (structured reasons, ordered deterministically)
|
||||||
|
* `inputs` (digests of all upstream inputs)
|
||||||
|
* `policy` (policy version + digest)
|
||||||
|
* `engine` (engine version + digest)
|
||||||
|
* `asOf` timestamp (the only “time” allowed to affect the result)
|
||||||
|
|
||||||
|
2. **VEX note** (OpenVEX/CSAF-compatible wrapper):
|
||||||
|
|
||||||
|
* references ScoreResult digest
|
||||||
|
* embeds the score (optional) + the input digests
|
||||||
|
* signed by Stella Ops Authority
|
||||||
|
|
||||||
|
> Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2) Make determinism non-negotiable
|
||||||
|
|
||||||
|
### 2.1 Determinism rules (implement as “engineering constraints”)
|
||||||
|
|
||||||
|
These are the common ways deterministic systems become non-deterministic:
|
||||||
|
|
||||||
|
* **No floating point** in scoring math. Use integer “basis points” and integer bucket tables.
|
||||||
|
* **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence “freshness” is computed as `asOf - evidence.timestamp`.
|
||||||
|
* **Canonical serialization** for hashing:
|
||||||
|
|
||||||
|
* Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile.
|
||||||
|
* Sort keys and arrays deterministically.
|
||||||
|
* **Stable ordering** for explanation lists:
|
||||||
|
|
||||||
|
* Always sort factors by `(factorId, contributingObjectDigest)`.
|
||||||
|
|
||||||
|
### 2.2 Fixed-point scoring approach (recommended)
|
||||||
|
|
||||||
|
Represent weights and multipliers as **basis points** (bps):
|
||||||
|
|
||||||
|
* 100% = 10,000 bps
|
||||||
|
* 1% = 100 bps
|
||||||
|
|
||||||
|
Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3) Extended score definition (v1)
|
||||||
|
|
||||||
|
### 3.1 Subscores (0–100 integers)
|
||||||
|
|
||||||
|
#### BaseSeverity (B)
|
||||||
|
|
||||||
|
* Source: CVSS if present, else vendor severity, else default.
|
||||||
|
* Normalize to 0–100:
|
||||||
|
|
||||||
|
* CVSS 0.0–10.0 → 0–100 by `B = round(CVSS * 10)`
|
||||||
|
|
||||||
|
Keep it small weight so you’re “beyond CVSS” but still anchored.
|
||||||
|
|
||||||
|
#### Reachability (R)
|
||||||
|
|
||||||
|
Computed from reachability report (call-path depth + gating conditions).
|
||||||
|
|
||||||
|
**Hop buckets** (example):
|
||||||
|
|
||||||
|
* 0–2 hops: 100
|
||||||
|
* 3 hops: 85
|
||||||
|
* 4 hops: 70
|
||||||
|
* 5 hops: 55
|
||||||
|
* 6 hops: 45
|
||||||
|
* 7 hops: 35
|
||||||
|
* 8+ hops: 20
|
||||||
|
* unreachable: 0
|
||||||
|
|
||||||
|
**Gate multipliers** (apply multiplicatively in bps):
|
||||||
|
|
||||||
|
* behind feature flag: ×7000
|
||||||
|
* auth required: ×8000
|
||||||
|
* only admin role: ×8500
|
||||||
|
* non-default config: ×7500
|
||||||
|
|
||||||
|
Final: `R = bucketScore * gateMultiplier / 10000`
|
||||||
|
|
||||||
|
#### Evidence (E)
|
||||||
|
|
||||||
|
Sum evidence “points” capped at 100, then apply freshness multiplier.
|
||||||
|
|
||||||
|
Evidence points (example):
|
||||||
|
|
||||||
|
* runtime trace hitting vulnerable symbol: +60
|
||||||
|
* DAST / integration test triggers behavior: +30
|
||||||
|
* SAST precise sink match: +20
|
||||||
|
* SCA presence only: +10
|
||||||
|
|
||||||
|
Freshness bucket multiplier (example):
|
||||||
|
|
||||||
|
* age ≤ 7 days: ×10000
|
||||||
|
* ≤ 30 days: ×9000
|
||||||
|
* ≤ 90 days: ×7500
|
||||||
|
* ≤ 180 days: ×6000
|
||||||
|
* ≤ 365 days: ×4000
|
||||||
|
* > 365: ×2000
|
||||||
|
|
||||||
|
Final: `E = min(100, sum(points)) * freshness / 10000`
|
||||||
|
|
||||||
|
#### Provenance (P)
|
||||||
|
|
||||||
|
Based on verified supply-chain checks.
|
||||||
|
|
||||||
|
Levels:
|
||||||
|
|
||||||
|
* unsigned/unknown: 0
|
||||||
|
* signed image: 30
|
||||||
|
* signed + SBOM hash-linked to image: 60
|
||||||
|
* signed + SBOM + DSSE attestations verified: 80
|
||||||
|
* above + reproducible build match: 100
|
||||||
|
|
||||||
|
### 3.2 Total score and overrides
|
||||||
|
|
||||||
|
Weights (example):
|
||||||
|
|
||||||
|
* `wB=1000` (10%)
|
||||||
|
* `wR=4500` (45%)
|
||||||
|
* `wE=3000` (30%)
|
||||||
|
* `wP=1500` (15%)
|
||||||
|
|
||||||
|
Total:
|
||||||
|
|
||||||
|
* `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
|
||||||
|
|
||||||
|
Override examples (still deterministic, because they depend on evidence flags):
|
||||||
|
|
||||||
|
* If `knownExploited=true` AND `R >= 70` → force score to 95+
|
||||||
|
* If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4) Canonical schemas (what to build first)
|
||||||
|
|
||||||
|
### 4.1 ReachabilityReport (per artifact + vuln)
|
||||||
|
|
||||||
|
Minimum fields:
|
||||||
|
|
||||||
|
* `artifactDigest` (sha256 of image or build artifact)
|
||||||
|
* `graphDigest` (sha256 of canonical call-graph representation)
|
||||||
|
* `vulnId` (CVE/OSV/etc)
|
||||||
|
* `vulnerableSymbol` (fully-qualified)
|
||||||
|
* `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers)
|
||||||
|
* `shortestPath`:
|
||||||
|
|
||||||
|
* `hops` (int)
|
||||||
|
* `nodes[]` (ordered list of symbols)
|
||||||
|
* `edges[]` (optional)
|
||||||
|
* `gates[]`:
|
||||||
|
|
||||||
|
* `type` (“featureFlag” | “authRequired” | “configNonDefault” | …)
|
||||||
|
* `detail` (string)
|
||||||
|
* `computedAt` (timestamp)
|
||||||
|
* `toolVersion`
|
||||||
|
|
||||||
|
### 4.2 EvidenceBundle (per artifact + vuln)
|
||||||
|
|
||||||
|
Evidence items are immutable and deduped by content hash.
|
||||||
|
|
||||||
|
* `evidenceId` (content hash)
|
||||||
|
* `artifactDigest`
|
||||||
|
* `vulnId`
|
||||||
|
* `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”)
|
||||||
|
* `tool` (name/version)
|
||||||
|
* `timestamp`
|
||||||
|
* `confidence` (0–100)
|
||||||
|
* `subject` (package, symbol, endpoint)
|
||||||
|
* `payloadDigest` (hash of raw payload stored separately)
|
||||||
|
|
||||||
|
### 4.3 ProvenanceReport (per artifact)
|
||||||
|
|
||||||
|
* `artifactDigest`
|
||||||
|
* `signatureChecks[]` (who signed, what key, result)
|
||||||
|
* `sbomDigest` + `sbomType`
|
||||||
|
* `attestations[]` (DSSE digests + verification result)
|
||||||
|
* `transparencyLogRefs[]` (optional)
|
||||||
|
* `reproducibleMatch` (bool)
|
||||||
|
* `computedAt`
|
||||||
|
* `toolVersion`
|
||||||
|
* `verificationLogDigest`
|
||||||
|
|
||||||
|
### 4.4 ScoreInput + ScoreResult
|
||||||
|
|
||||||
|
**ScoreInput** should include:
|
||||||
|
|
||||||
|
* `asOf`
|
||||||
|
* `policyVersion`
|
||||||
|
* digests for reachability/evidence/provenance/base severity source
|
||||||
|
|
||||||
|
**ScoreResult** should include:
|
||||||
|
|
||||||
|
* `riskScore`, `subscores`
|
||||||
|
* `explain[]` (deterministic)
|
||||||
|
* `engineVersion`, `policyDigest`
|
||||||
|
* `inputs[]` (digests)
|
||||||
|
* `resultDigest` (hash of canonical ScoreResult)
|
||||||
|
* `signature` (Authority signs the digest)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5) Development implementation plan (phased, with deliverables + acceptance criteria)
|
||||||
|
|
||||||
|
### Phase A — Foundations: schemas, hashing, policy format, test harness
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Canonical JSON format rules + hashing utilities (shared lib)
|
||||||
|
* JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult
|
||||||
|
* “Golden fixture” repo: a set of input bundles and expected ScoreResults
|
||||||
|
* Policy format `score.v1` (YAML or JSON) using **integer bps**
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Same input bundle → identical `resultDigest` across:
|
||||||
|
|
||||||
|
* OS (Linux/Windows)
|
||||||
|
* CPU (x64/ARM64)
|
||||||
|
* runtime versions (supported .NET versions)
|
||||||
|
* Fixtures run in CI and fail on any byte-level diff
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase B — Scoring engine (pure function library)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* `Stella.ScoreEngine` as a pure library:
|
||||||
|
|
||||||
|
* `ComputeScore(ScoreInputBundle) -> ScoreResult`
|
||||||
|
* `Explain(ScoreResult) -> structured explanation` (already embedded)
|
||||||
|
* Policy parser + validator:
|
||||||
|
|
||||||
|
* weights sum to 10,000
|
||||||
|
* bucket tables monotonic
|
||||||
|
* override rules deterministic and total order
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* 100% deterministic tests passing (golden fixtures)
|
||||||
|
* “Explain” always includes:
|
||||||
|
|
||||||
|
* subscores
|
||||||
|
* applied buckets
|
||||||
|
* applied gate multipliers
|
||||||
|
* freshness bucket selected
|
||||||
|
* provenance level selected
|
||||||
|
* No non-deterministic dependencies (time, random, locale, float)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase C — Evidence pipeline (Vexer / Evidence Store)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Normalized evidence ingestion adapters:
|
||||||
|
|
||||||
|
* SCA ingest (from your existing scanner output)
|
||||||
|
* SAST ingest
|
||||||
|
* DAST ingest
|
||||||
|
* runtime trace ingest (optional MVP → “symbol hit” events)
|
||||||
|
* Evidence Store service:
|
||||||
|
|
||||||
|
* immutability (append-only)
|
||||||
|
* dedupe by `evidenceId`
|
||||||
|
* query by `(artifactDigest, vulnId)`
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Ingesting the same evidence twice yields identical state (idempotent)
|
||||||
|
* Every evidence record can be exported as a bundle with content hashes
|
||||||
|
* Evidence timestamps preserved; `asOf` drives freshness deterministically
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase D — Reachability analyzer (Scanner extension)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Call-graph builder and symbol resolver:
|
||||||
|
|
||||||
|
* for .NET: IL-level call graph + ASP.NET route discovery
|
||||||
|
* Reachability computation:
|
||||||
|
|
||||||
|
* compute shortest path hops from entrypoints to vulnerable symbol
|
||||||
|
* attach gating detections (config/feature/auth heuristics)
|
||||||
|
* Reachability report emitter:
|
||||||
|
|
||||||
|
* emits ReachabilityReport with stable digests
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Given the same build artifact, reachability report digest is stable
|
||||||
|
* Paths are replayable and visualizable (nodes are resolvable)
|
||||||
|
* Unreachable findings are explicitly marked and explainable
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase E — Provenance verification (Authority / Proof‑Graph)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Verification pipeline:
|
||||||
|
|
||||||
|
* signature verification for artifact digest
|
||||||
|
* SBOM hash linking
|
||||||
|
* attestation verification (DSSE/in‑toto style)
|
||||||
|
* optional transparency log reference capture
|
||||||
|
* optional reproducible-build comparison input
|
||||||
|
* ProvenanceReport emitter (signed verification log digest)
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Verification is offline-capable if given the necessary bundles
|
||||||
|
* Any failed check is captured with a deterministic error code + message
|
||||||
|
* ProvenanceReport digest is stable for same inputs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase F — Orchestration: “score a finding” workflow + VEX output
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Orchestrator service (or existing pipeline step) that:
|
||||||
|
|
||||||
|
1. receives a vulnerability finding
|
||||||
|
2. fetches reachability/evidence/provenance bundles
|
||||||
|
3. builds ScoreInput with `asOf`
|
||||||
|
4. computes ScoreResult
|
||||||
|
5. signs ScoreResult digest
|
||||||
|
6. emits VEX note referencing ScoreResult digest
|
||||||
|
* Storage for ScoreResult + VEX note (immutable, versioned)
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* “Recompute” produces same ScoreResult digest if inputs unchanged
|
||||||
|
* VEX note includes:
|
||||||
|
|
||||||
|
* policy version + digest
|
||||||
|
* engine version
|
||||||
|
* input digests
|
||||||
|
* score + subscores
|
||||||
|
* End-to-end API returns “why” data in <1 round trip (cached)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase G — UI: “Why this score?” and replay/export
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Findings view enhancements:
|
||||||
|
|
||||||
|
* score badge + risk bucket (Low/Med/High/Critical)
|
||||||
|
* click-through “Why this score”
|
||||||
|
* “Why this score” panel:
|
||||||
|
|
||||||
|
* call path visualization (at least as an ordered list for MVP)
|
||||||
|
* evidence list with freshness + confidence
|
||||||
|
* provenance checks list (pass/fail)
|
||||||
|
* export bundle (inputs + policy + engine version) for audit replay
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Any score is explainable in <30 seconds by a human reviewer
|
||||||
|
* Exported bundle can reproduce score offline
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase H — Governance: policy-as-code, versioning, calibration, rollout
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Policy registry:
|
||||||
|
|
||||||
|
* store `score.v1` policies by org/project/environment
|
||||||
|
* approvals + change log
|
||||||
|
* Versioning strategy:
|
||||||
|
|
||||||
|
* engine semantic versioning
|
||||||
|
* policy digest pinned in ScoreResult
|
||||||
|
* migration tooling (e.g., score.v1 → score.v2)
|
||||||
|
* Rollout mechanics:
|
||||||
|
|
||||||
|
* shadow mode: compute score but don’t enforce
|
||||||
|
* enforcement gates: block deploy if score ≥ threshold
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Policy changes never rewrite past scores
|
||||||
|
* You can backfill new scores with a new policy version without ambiguity
|
||||||
|
* Audit log shows: who changed policy, when, why (optional but recommended)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6) Engineering backlog (epics → stories → DoD)
|
||||||
|
|
||||||
|
### Epic 1: Deterministic core
|
||||||
|
|
||||||
|
* Story: implement canonical JSON + hashing
|
||||||
|
* Story: implement fixed-point math helpers (bps)
|
||||||
|
* Story: implement score.v1 buckets + overrides
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* no floats
|
||||||
|
* golden test suite
|
||||||
|
* deterministic explain ordering
|
||||||
|
|
||||||
|
### Epic 2: Evidence normalization
|
||||||
|
|
||||||
|
* Story: evidence schema + dedupe
|
||||||
|
* Story: adapters (SCA/SAST/DAST/runtime)
|
||||||
|
* Story: evidence query API
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* idempotent ingest
|
||||||
|
* bundle export with digests
|
||||||
|
|
||||||
|
### Epic 3: Reachability
|
||||||
|
|
||||||
|
* Story: entrypoint discovery for target frameworks
|
||||||
|
* Story: call graph extraction
|
||||||
|
* Story: shortest-path computation
|
||||||
|
* Story: gating heuristics
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* stable digests
|
||||||
|
* replayable paths
|
||||||
|
|
||||||
|
### Epic 4: Provenance
|
||||||
|
|
||||||
|
* Story: verify signatures
|
||||||
|
* Story: verify SBOM link
|
||||||
|
* Story: verify attestations
|
||||||
|
* Story: reproducible match input support
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* deterministic error codes
|
||||||
|
* stable provenance scoring
|
||||||
|
|
||||||
|
### Epic 5: End-to-end score + VEX
|
||||||
|
|
||||||
|
* Story: orchestration
|
||||||
|
* Story: ScoreResult signing
|
||||||
|
* Story: VEX generation and storage
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* recompute parity
|
||||||
|
* verifiable signatures
|
||||||
|
|
||||||
|
### Epic 6: UI
|
||||||
|
|
||||||
|
* Story: score badge + buckets
|
||||||
|
* Story: why panel
|
||||||
|
* Story: export bundle + recompute button
|
||||||
|
* DoD:
|
||||||
|
|
||||||
|
* human explainability
|
||||||
|
* offline replay works
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7) APIs to implement (minimal but complete)
|
||||||
|
|
||||||
|
### 7.1 Compute score (internal)
|
||||||
|
|
||||||
|
* `POST /api/score/compute`
|
||||||
|
|
||||||
|
* input: `ScoreInput` + references or inline bundles
|
||||||
|
* output: `ScoreResult`
|
||||||
|
|
||||||
|
### 7.2 Get score (product)
|
||||||
|
|
||||||
|
* `GET /api/findings/{findingId}/score`
|
||||||
|
|
||||||
|
* returns latest ScoreResult + VEX reference
|
||||||
|
|
||||||
|
### 7.3 Explain score
|
||||||
|
|
||||||
|
* `GET /api/findings/{findingId}/score/explain`
|
||||||
|
|
||||||
|
* returns `explain[]` + call path + evidence list + provenance checks
|
||||||
|
|
||||||
|
### 7.4 Export replay bundle
|
||||||
|
|
||||||
|
* `GET /api/findings/{findingId}/score/bundle`
|
||||||
|
|
||||||
|
* returns a tar/zip containing:
|
||||||
|
|
||||||
|
* ScoreInput
|
||||||
|
* policy file
|
||||||
|
* reachability/evidence/provenance reports
|
||||||
|
* engine version manifest
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8) Testing strategy (what to automate early)
|
||||||
|
|
||||||
|
### Unit tests
|
||||||
|
|
||||||
|
* bucket selection correctness
|
||||||
|
* gate multiplier composition
|
||||||
|
* evidence freshness bucketing
|
||||||
|
* provenance level mapping
|
||||||
|
* override rule ordering
|
||||||
|
|
||||||
|
### Golden fixtures
|
||||||
|
|
||||||
|
* fixed input bundles → fixed ScoreResult digest
|
||||||
|
* run on every supported platform/runtime
|
||||||
|
|
||||||
|
### Property-based tests
|
||||||
|
|
||||||
|
* monotonicity:
|
||||||
|
|
||||||
|
* fewer hops should not reduce R
|
||||||
|
* more evidence points should not reduce E
|
||||||
|
* stronger provenance should not reduce P
|
||||||
|
|
||||||
|
### Integration tests
|
||||||
|
|
||||||
|
* full pipeline: finding → bundles → score → VEX
|
||||||
|
* “recompute” parity tests
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9) Operational concerns and hardening
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
* Cache reachability per `(artifactDigest, vulnId, symbol)`
|
||||||
|
* Cache provenance per `artifactDigest`
|
||||||
|
* Evidence queries should be indexed by `(artifactDigest, vulnId, type)`
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
* Treat evidence ingestion as untrusted input:
|
||||||
|
|
||||||
|
* strict schema validation
|
||||||
|
* content-hash dedupe prevents tampering via overwrite
|
||||||
|
* Sign ScoreResults and VEX notes
|
||||||
|
* RBAC:
|
||||||
|
|
||||||
|
* who can change policy
|
||||||
|
* who can override scores (if allowed at all)
|
||||||
|
|
||||||
|
### Data retention
|
||||||
|
|
||||||
|
* Evidence payloads can be large; keep digests + store raw payloads in object storage
|
||||||
|
* Keep a “minimal replay bundle” always (schemas + digests + policy + engine)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10) Concrete “MVP first” slice (smallest valuable product)
|
||||||
|
|
||||||
|
If you want a crisp MVP that still satisfies “auditable determinism”:
|
||||||
|
|
||||||
|
1. Scoring engine (`B + R + E + P`), fixed-point, golden tests
|
||||||
|
2. Evidence store (SCA + runtime optional)
|
||||||
|
3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates)
|
||||||
|
4. Provenance: signed image + SBOM link only
|
||||||
|
5. UI: score + “why” panel showing:
|
||||||
|
|
||||||
|
* hops/path list
|
||||||
|
* evidence list
|
||||||
|
* provenance checklist
|
||||||
|
6. Emit a signed VEX note containing the score + input digests
|
||||||
|
|
||||||
|
That MVP already proves the core differentiator: **deterministic, replayable risk scoring**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11) Starter policy file (score.v1) using basis points
|
||||||
|
|
||||||
|
Here’s a good “real implementation” starting point (int-only):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
policyVersion: score.v1
|
||||||
|
weightsBps:
|
||||||
|
baseSeverity: 1000
|
||||||
|
reachability: 4500
|
||||||
|
evidence: 3000
|
||||||
|
provenance: 1500
|
||||||
|
|
||||||
|
reachability:
|
||||||
|
hopBuckets:
|
||||||
|
- { maxHops: 2, score: 100 }
|
||||||
|
- { maxHops: 3, score: 85 }
|
||||||
|
- { maxHops: 4, score: 70 }
|
||||||
|
- { maxHops: 5, score: 55 }
|
||||||
|
- { maxHops: 6, score: 45 }
|
||||||
|
- { maxHops: 7, score: 35 }
|
||||||
|
- { maxHops: 9999, score: 20 }
|
||||||
|
unreachableScore: 0
|
||||||
|
gateMultipliersBps:
|
||||||
|
featureFlag: 7000
|
||||||
|
authRequired: 8000
|
||||||
|
adminOnly: 8500
|
||||||
|
nonDefaultConfig: 7500
|
||||||
|
|
||||||
|
evidence:
|
||||||
|
points:
|
||||||
|
runtime: 60
|
||||||
|
dast: 30
|
||||||
|
sast: 20
|
||||||
|
sca: 10
|
||||||
|
freshnessBuckets:
|
||||||
|
- { maxAgeDays: 7, multiplierBps: 10000 }
|
||||||
|
- { maxAgeDays: 30, multiplierBps: 9000 }
|
||||||
|
- { maxAgeDays: 90, multiplierBps: 7500 }
|
||||||
|
- { maxAgeDays: 180, multiplierBps: 6000 }
|
||||||
|
- { maxAgeDays: 365, multiplierBps: 4000 }
|
||||||
|
- { maxAgeDays: 99999, multiplierBps: 2000 }
|
||||||
|
|
||||||
|
provenance:
|
||||||
|
levels:
|
||||||
|
unsigned: 0
|
||||||
|
signed: 30
|
||||||
|
signedWithSbom: 60
|
||||||
|
signedWithSbomAndAttestations: 80
|
||||||
|
reproducible: 100
|
||||||
|
|
||||||
|
overrides:
|
||||||
|
- name: knownExploitedAndReachable
|
||||||
|
when:
|
||||||
|
flags:
|
||||||
|
knownExploited: true
|
||||||
|
minReachability: 70
|
||||||
|
setScore: 95
|
||||||
|
|
||||||
|
- name: unreachableAndOnlySca
|
||||||
|
when:
|
||||||
|
maxReachability: 0
|
||||||
|
maxEvidence: 10
|
||||||
|
clampMaxScore: 25
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering.
|
||||||
@@ -0,0 +1,744 @@
|
|||||||
|
Here’s a simple, high‑leverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer:
|
||||||
|
|
||||||
|
# Time‑to‑First‑Signal (TTFS)
|
||||||
|
|
||||||
|
**What it is:** the time from opening a run’s details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed – `dotnet restore` 401 – token expired”).
|
||||||
|
**Why it matters:** engineers don’t need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What counts as a “first signal”
|
||||||
|
|
||||||
|
* Failed stage + reason (exit code, key log line, failing test name)
|
||||||
|
* Degraded but actionable status (e.g., flaky test signature)
|
||||||
|
* Policy gate block with the specific rule that failed
|
||||||
|
* Reachability‑aware security finding that blocks deploy (one concrete example, not the whole list)
|
||||||
|
|
||||||
|
> Not a signal: spinners, generic “loading…”, or unactionable counts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to optimize TTFS (practical steps)
|
||||||
|
|
||||||
|
1. **Deferred loading (prioritize critical panes):**
|
||||||
|
|
||||||
|
* Render header + failing stage card first; lazy‑load artifacts, full logs, and graphs after.
|
||||||
|
* Pre‑expand the *first failing node* in the stage graph.
|
||||||
|
|
||||||
|
2. **Log pre‑indexing at ingest:**
|
||||||
|
|
||||||
|
* During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`.
|
||||||
|
* Extract the **first error tuple** (timestamp, step, message) and store it next to the job record.
|
||||||
|
* On UI open, fetch only that tuple (sub‑100 ms) before fetching the rest.
|
||||||
|
|
||||||
|
3. **Cached summaries:**
|
||||||
|
|
||||||
|
* Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres.
|
||||||
|
* Invalidate on new job events; always serve this summary first.
|
||||||
|
|
||||||
|
4. **Edge prefetch:**
|
||||||
|
|
||||||
|
* When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”.
|
||||||
|
|
||||||
|
5. **Compress + cap first log burst:**
|
||||||
|
|
||||||
|
* Send the first **5–10 error lines** (already extracted) immediately; stream the rest.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Instrumentation (so you can prove it)
|
||||||
|
|
||||||
|
Emit these points as telemetry:
|
||||||
|
|
||||||
|
* `ttfs_start`: when the run details route is entered (or when tab becomes visible)
|
||||||
|
* `ttfs_signal_rendered`: when the first actionable card is in the DOM
|
||||||
|
* `ttfs_ms = signal_rendered - start`
|
||||||
|
* Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state`
|
||||||
|
|
||||||
|
**SLO:** *P50 ≤ 700 ms, P95 ≤ 2.5 s* (adjust to your infra).
|
||||||
|
|
||||||
|
**Dashboards to track:**
|
||||||
|
|
||||||
|
* TTFS distribution (P50/P90/P95) by release
|
||||||
|
* Correlate TTFS with bounce rate and “open → rerun” delay
|
||||||
|
* Error budget: % of views with TTFS > 3 s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minimal backend contract (example)
|
||||||
|
|
||||||
|
```json
|
||||||
|
GET /api/runs/{runId}/first-signal
|
||||||
|
{
|
||||||
|
"runId": "123",
|
||||||
|
"firstSignal": {
|
||||||
|
"type": "stage_failed",
|
||||||
|
"stage": "build",
|
||||||
|
"step": "dotnet restore",
|
||||||
|
"message": "401 Unauthorized: token expired",
|
||||||
|
"at": "2025-12-11T09:22:31Z",
|
||||||
|
"artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} }
|
||||||
|
},
|
||||||
|
"summaryEtag": "W/\"a1b2c3\""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Frontend pattern (Angular 17, signal‑first)
|
||||||
|
|
||||||
|
* Fire `first-signal` request in route resolver.
|
||||||
|
* Render `FirstSignalCard` immediately.
|
||||||
|
* Lazy‑load stage graph, full logs, security panes.
|
||||||
|
* Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CI adapter hints (GitLab/GitHub/Azure)
|
||||||
|
|
||||||
|
* Hook on job status webhooks to compute & store the first error tuple.
|
||||||
|
* For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## “Good TTFS” acceptance tests
|
||||||
|
|
||||||
|
* Run with early fail → first signal < 1 s, shows exact command + exit code.
|
||||||
|
* Run with policy gate fail → rule name + fix hint visible first.
|
||||||
|
* Offline/slow network → cached summary still renders an actionable hint.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Copy to put in your UX guidelines
|
||||||
|
|
||||||
|
> “Optimize **Time‑to‑First‑Signal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.”
|
||||||
|
|
||||||
|
If you want, I can sketch the exact DB schema for the pre‑indexed log tuples and the Angular resolver + telemetry hooks next.
|
||||||
|
Below is an extended, end‑to‑end implementation plan for **Time‑to‑First‑Signal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rollout—structured as epics/phases with “definition of done” and acceptance criteria.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Scope extension
|
||||||
|
|
||||||
|
## What we’re building
|
||||||
|
|
||||||
|
A run details experience that renders **one actionable clue** fast—before loading heavy UI like full logs, graphs, artifacts.
|
||||||
|
|
||||||
|
**“First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification).
|
||||||
|
|
||||||
|
## What we’re extending beyond the initial idea
|
||||||
|
|
||||||
|
1. **First‑Signal Quality** (not just speed)
|
||||||
|
|
||||||
|
* Classify error type (auth, dependency, compilation, test, infra, policy, timeout).
|
||||||
|
* Identify “culprit step” and a stable “signature” for dedupe and search.
|
||||||
|
2. **Progressive disclosure UX**
|
||||||
|
|
||||||
|
* Summary → First signal card → expanded context (stage graph, logs, artifacts).
|
||||||
|
3. **Provider‑agnostic ingestion**
|
||||||
|
|
||||||
|
* Adapters for GitLab/GitHub/Azure (or your CI provider).
|
||||||
|
4. **Caching + prefetch**
|
||||||
|
|
||||||
|
* Warm open from list/table, with ETags and stale‑while‑revalidate.
|
||||||
|
5. **Observability & SLOs**
|
||||||
|
|
||||||
|
* TTFS metrics, dashboards, alerting, and quality metrics (false signals).
|
||||||
|
6. **Rollout safety**
|
||||||
|
|
||||||
|
* Feature flags, canary, A/B gating, and a guaranteed fallback path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Success criteria
|
||||||
|
|
||||||
|
## Primary metric
|
||||||
|
|
||||||
|
* **TTFS (ms)**: time from details page route enter → first actionable signal rendered.
|
||||||
|
|
||||||
|
## Targets (example SLOs)
|
||||||
|
|
||||||
|
* **P50 ≤ 700 ms**, **P95 ≤ 2500 ms** on warm path.
|
||||||
|
* **Cold path**: P95 ≤ 4000 ms (depends on infra).
|
||||||
|
|
||||||
|
## Secondary outcome metrics
|
||||||
|
|
||||||
|
* **Open→Action time**: time from opening run to first user action (rerun, cancel, assign, open failing log line).
|
||||||
|
* **Bounce rate**: close page within 10 seconds without interaction.
|
||||||
|
* **MTTR proxy**: time from failure to first rerun or fix commit.
|
||||||
|
|
||||||
|
## Quality metrics
|
||||||
|
|
||||||
|
* **Signal availability rate**: % of run views that show a first signal card within 3s.
|
||||||
|
* **Signal accuracy score** (sampled): engineer confirms “helpful vs not”.
|
||||||
|
* **Extractor failure rate**: parsing errors / missing mappings / timeouts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Architecture overview
|
||||||
|
|
||||||
|
## Data flow
|
||||||
|
|
||||||
|
1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend.
|
||||||
|
2. **Run summarizer** maintains:
|
||||||
|
|
||||||
|
* `run_summary` (small JSON)
|
||||||
|
* `first_signal` (small, actionable payload)
|
||||||
|
3. **UI opens run details**
|
||||||
|
|
||||||
|
* Immediately calls `GET /runs/{id}/first-signal` (or `/summary`).
|
||||||
|
* Renders FirstSignalCard as soon as payload arrives.
|
||||||
|
4. Background fetches:
|
||||||
|
|
||||||
|
* Stage graph, full logs, artifacts, security scans, trends.
|
||||||
|
|
||||||
|
## Key decision: where to compute first signal
|
||||||
|
|
||||||
|
* **Option A: at ingest time (recommended)**
|
||||||
|
Compute first signal when logs/events arrive, store it, serve it instantly.
|
||||||
|
* **Option B: on demand**
|
||||||
|
Compute when user opens run details (simpler initially, worse TTFS and load).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Data model
|
||||||
|
|
||||||
|
## Tables (relational example)
|
||||||
|
|
||||||
|
### `ci_run`
|
||||||
|
|
||||||
|
* `run_id (pk)`
|
||||||
|
* `provider`
|
||||||
|
* `repo_id`
|
||||||
|
* `branch`
|
||||||
|
* `status`
|
||||||
|
* `created_at`, `updated_at`
|
||||||
|
|
||||||
|
### `ci_job`
|
||||||
|
|
||||||
|
* `job_id (pk)`
|
||||||
|
* `run_id (fk)`
|
||||||
|
* `stage_name`
|
||||||
|
* `job_name`
|
||||||
|
* `status`
|
||||||
|
* `started_at`, `finished_at`
|
||||||
|
|
||||||
|
### `ci_log_chunk`
|
||||||
|
|
||||||
|
* `chunk_id (pk)`
|
||||||
|
* `job_id (fk)`
|
||||||
|
* `seq` (monotonic)
|
||||||
|
* `byte_start`, `byte_end` (range into blob)
|
||||||
|
* `first_error_line_no` (nullable)
|
||||||
|
* `first_error_excerpt` (nullable, short)
|
||||||
|
* `severity_max` (info/warn/error)
|
||||||
|
|
||||||
|
### `ci_run_summary`
|
||||||
|
|
||||||
|
* `run_id (pk)`
|
||||||
|
* `version` (e.g., `1`)
|
||||||
|
* `etag` (hash)
|
||||||
|
* `summary_json` (small, 1–5 KB)
|
||||||
|
* `updated_at`
|
||||||
|
|
||||||
|
### `ci_first_signal`
|
||||||
|
|
||||||
|
* `run_id (pk)`
|
||||||
|
* `etag`
|
||||||
|
* `signal_json` (small, 0.5–2 KB)
|
||||||
|
* `quality_flags` (bitmask or json)
|
||||||
|
* `updated_at`
|
||||||
|
|
||||||
|
## Cache layer
|
||||||
|
|
||||||
|
* Redis keys:
|
||||||
|
|
||||||
|
* `run:{runId}:summary:v1`
|
||||||
|
* `run:{runId}:first-signal:v1`
|
||||||
|
* TTL: generous but safe (e.g., 24h) with “write‑through” on event updates.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# First signal definition
|
||||||
|
|
||||||
|
## `FirstSignal` object (recommended shape)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"runId": "123",
|
||||||
|
"computedAt": "2025-12-12T09:22:31Z",
|
||||||
|
"status": "failed",
|
||||||
|
"firstSignal": {
|
||||||
|
"type": "stage_failed",
|
||||||
|
"classification": "dependency_auth",
|
||||||
|
"stage": "build",
|
||||||
|
"job": "build-linux-x64",
|
||||||
|
"step": "dotnet restore",
|
||||||
|
"message": "401 Unauthorized: token expired",
|
||||||
|
"signature": "dotnet-restore-401-unauthorized",
|
||||||
|
"log": {
|
||||||
|
"jobId": "job-789",
|
||||||
|
"lines": [
|
||||||
|
"error : Response status code does not indicate success: 401 (Unauthorized).",
|
||||||
|
"error : The token is expired."
|
||||||
|
],
|
||||||
|
"range": { "start": 1880, "end": 1896 }
|
||||||
|
},
|
||||||
|
"suggestedActions": [
|
||||||
|
{ "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" },
|
||||||
|
{ "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"etag": "W/\"a1b2c3\""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notes
|
||||||
|
|
||||||
|
* `signature` should be stable for grouping.
|
||||||
|
* `suggestedActions` is optional but hugely valuable (even 1–2 actions).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# APIs
|
||||||
|
|
||||||
|
## 1) First signal endpoint
|
||||||
|
|
||||||
|
**GET** `/api/runs/{runId}/first-signal`
|
||||||
|
|
||||||
|
Headers:
|
||||||
|
|
||||||
|
* `If-None-Match: W/"..."` supported
|
||||||
|
* Response includes `ETag` and `Cache-Control`
|
||||||
|
|
||||||
|
Responses:
|
||||||
|
|
||||||
|
* `200`: full first signal object
|
||||||
|
* `304`: not modified
|
||||||
|
* `404`: run not found
|
||||||
|
* `204`: run exists but signal not available yet (rare; should degrade gracefully)
|
||||||
|
|
||||||
|
## 2) Summary endpoint (optional but useful)
|
||||||
|
|
||||||
|
**GET** `/api/runs/{runId}/summary`
|
||||||
|
|
||||||
|
* Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts.
|
||||||
|
|
||||||
|
## 3) SSE / WebSocket updates (nice-to-have)
|
||||||
|
|
||||||
|
**GET** `/api/runs/{runId}/events` (SSE)
|
||||||
|
|
||||||
|
* Push new signal or summary updates in near real-time while user is on the page.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Frontend implementation plan (Angular 17)
|
||||||
|
|
||||||
|
## UX behavior
|
||||||
|
|
||||||
|
1. **Route enter**
|
||||||
|
|
||||||
|
* Start TTFS timer.
|
||||||
|
2. Render instantly:
|
||||||
|
|
||||||
|
* Title, status badge, pipeline metadata (run id, commit, branch).
|
||||||
|
* Skeleton for details area.
|
||||||
|
3. Fetch first signal:
|
||||||
|
|
||||||
|
* Render `FirstSignalCard` immediately when available.
|
||||||
|
* Fire telemetry event when card is **in DOM and visible**.
|
||||||
|
4. Lazy-load:
|
||||||
|
|
||||||
|
* Stage graph
|
||||||
|
* Full logs viewer
|
||||||
|
* Artifacts list
|
||||||
|
* Security findings
|
||||||
|
* Trends, flaky tests, etc.
|
||||||
|
|
||||||
|
## Angular structure
|
||||||
|
|
||||||
|
* `RunDetailsResolver` (or `resolveFn`) requests first signal.
|
||||||
|
* `RunDetailsComponent` uses signals to render quickly.
|
||||||
|
* `FirstSignalCardComponent` is standalone + minimal deps.
|
||||||
|
|
||||||
|
## Prefetch strategy from runs list view
|
||||||
|
|
||||||
|
* When the runs table is visible, prefetch summaries/first signals for items in viewport:
|
||||||
|
|
||||||
|
* Use `IntersectionObserver` to prefetch only visible rows.
|
||||||
|
* Store results in an in-memory cache (e.g., `Map<runId, FirstSignal>`).
|
||||||
|
* Respect ETag to avoid redundant payloads.
|
||||||
|
|
||||||
|
## Telemetry hooks
|
||||||
|
|
||||||
|
* `ttfs_start`: route activation + tab visible
|
||||||
|
* `ttfs_signal_rendered`: FirstSignalCard attached and visible
|
||||||
|
* Dimensions: provider, repo, branch, run_type, release_version, network_state
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Backend implementation plan
|
||||||
|
|
||||||
|
## Summarizer / First-signal service
|
||||||
|
|
||||||
|
A service or module that:
|
||||||
|
|
||||||
|
* subscribes to run/job events
|
||||||
|
* receives log chunks (or pointers)
|
||||||
|
* computes and stores:
|
||||||
|
|
||||||
|
* `run_summary`
|
||||||
|
* `first_signal`
|
||||||
|
* publishes updates (optional) to an event stream for SSE
|
||||||
|
|
||||||
|
### Concurrency rule
|
||||||
|
|
||||||
|
First signal should be set once per run unless a “better” signal appears:
|
||||||
|
|
||||||
|
* if current signal is missing → set
|
||||||
|
* if current signal is “generic” and new one is “specific” → replace
|
||||||
|
* otherwise keep (avoid churn)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Extraction & classification logic
|
||||||
|
|
||||||
|
## Minimum viable extractor (Phase 1)
|
||||||
|
|
||||||
|
* Heuristics:
|
||||||
|
|
||||||
|
* first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc.
|
||||||
|
* plus provider-specific fail markers
|
||||||
|
* Pull:
|
||||||
|
|
||||||
|
* stage/job/step context (from job metadata or step boundaries)
|
||||||
|
* 5–10 log lines around first error line
|
||||||
|
|
||||||
|
## Improved extractor (Phase 2+)
|
||||||
|
|
||||||
|
* Language/tool specific rules:
|
||||||
|
|
||||||
|
* dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm
|
||||||
|
* Add `classification` and `signature`:
|
||||||
|
|
||||||
|
* normalize common errors:
|
||||||
|
|
||||||
|
* auth expired/forbidden
|
||||||
|
* missing dependency / DNS / TLS
|
||||||
|
* compilation error
|
||||||
|
* test failure (include test name)
|
||||||
|
* infra capacity / agent lost
|
||||||
|
* policy gate failure
|
||||||
|
|
||||||
|
## Guardrails
|
||||||
|
|
||||||
|
* **Secret redaction**: before storing excerpts, run your existing redaction pipeline.
|
||||||
|
* **Payload cap**: cap message length and excerpt lines.
|
||||||
|
* **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Development plan by phases (epics)
|
||||||
|
|
||||||
|
Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 0 — Baseline and alignment
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* Baseline TTFS measurement (current behavior)
|
||||||
|
* Definition of “actionable signal” and priority rules
|
||||||
|
* Performance budget for run details view
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* Add client-side telemetry for current page load steps:
|
||||||
|
|
||||||
|
* route enter, summary loaded, logs loaded, graph loaded
|
||||||
|
* Measure TTFS proxy today (likely “time to status shown”)
|
||||||
|
* Identify top 20 failure modes in your CI (from historical logs)
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* Dashboard shows baseline P50/P95 for current experience.
|
||||||
|
* “First signal” contract signed off with UI + backend teams.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Data model and storage
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* DB migrations for `ci_run_summary` and `ci_first_signal`
|
||||||
|
* Redis cache keys and invalidation strategy
|
||||||
|
* ADR: where summaries live and how they update
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* Create tables and indices:
|
||||||
|
|
||||||
|
* index on `run_id`, `updated_at`, `provider`
|
||||||
|
* Add serializer/deserializer for `summary_json` and `signal_json`
|
||||||
|
* Implement ETag generation (hash of JSON payload)
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache).
|
||||||
|
* ETag works end-to-end.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Ingestion and first signal computation
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* First-signal computation module
|
||||||
|
* Provider adapter integration points (webhook consumers)
|
||||||
|
* “first error tuple” extraction from logs
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* On job log append:
|
||||||
|
|
||||||
|
* scan incrementally for first error markers
|
||||||
|
* store excerpt + line range + job/stage/step mapping
|
||||||
|
* On job finish/fail:
|
||||||
|
|
||||||
|
* finalize first signal with best known context
|
||||||
|
* Implement the “better signal replaces generic” rule
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* For a known failing run, API returns first signal without reading full log blob.
|
||||||
|
* Computation does not exceed a small CPU budget per log chunk (guard with limits).
|
||||||
|
* Extraction failure rate < 1% for sampled runs (initial).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — API endpoints and caching
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* `/runs/{id}/first-signal` endpoint
|
||||||
|
* Optional `/runs/{id}/summary`
|
||||||
|
* Cache-control + ETag support
|
||||||
|
* Access control checks consistent with existing run authorization
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* Serve cached first signal first; fallback to DB
|
||||||
|
* If missing:
|
||||||
|
|
||||||
|
* return `204` (or a “pending” object) and allow UI fallback
|
||||||
|
* Add server-side metrics:
|
||||||
|
|
||||||
|
* endpoint latency, cache hit rate, payload size
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* Endpoint P95 latency meets target (e.g., < 200ms internal).
|
||||||
|
* Cache hit rate is high for active runs (after prefetch).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Frontend progressive rendering
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* FirstSignalCard component
|
||||||
|
* Route resolver + local cache
|
||||||
|
* Prefetch on runs list view
|
||||||
|
* Telemetry for TTFS
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* Render shell immediately
|
||||||
|
* Fetch and render first signal
|
||||||
|
* Lazy-load heavy panels using `@defer` / dynamic imports
|
||||||
|
* Implement “open failing stage” default behavior
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* In throttled network test, first signal card appears significantly earlier than logs and graphs.
|
||||||
|
* `ttfs_signal_rendered` fires exactly once per view, with correct dimensions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Observability, dashboards, and alerting
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* TTFS dashboards by:
|
||||||
|
|
||||||
|
* provider, repo, run type, release version
|
||||||
|
* Alerts:
|
||||||
|
|
||||||
|
* P95 regression threshold
|
||||||
|
* Quality dashboard:
|
||||||
|
|
||||||
|
* availability rate, extraction failures, “generic signal rate”
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
* Create event pipeline for telemetry into your analytics system
|
||||||
|
* Define SLO/error budget alerts
|
||||||
|
* Add tracing (OpenTelemetry) for endpoint and summarizer
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* You can correlate TTFS with:
|
||||||
|
|
||||||
|
* bounce rate
|
||||||
|
* open→action time
|
||||||
|
* You can pinpoint whether regressions are backend, frontend, or provider‑specific.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6 — QA, performance testing, rollout
|
||||||
|
|
||||||
|
### Deliverables
|
||||||
|
|
||||||
|
* Automated tests
|
||||||
|
* Feature flag + gradual rollout
|
||||||
|
* A/B experiment (optional)
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
|
||||||
|
**Testing**
|
||||||
|
|
||||||
|
* Unit tests:
|
||||||
|
|
||||||
|
* extractor patterns
|
||||||
|
* classification rules
|
||||||
|
* Integration tests:
|
||||||
|
|
||||||
|
* simulated job logs with known outcomes
|
||||||
|
* E2E (Playwright/Cypress):
|
||||||
|
|
||||||
|
* verify first signal appears before logs
|
||||||
|
* verify fallback path works if endpoint fails
|
||||||
|
* Performance tests:
|
||||||
|
|
||||||
|
* cold cache vs warm cache
|
||||||
|
* throttled CPU/network profiles
|
||||||
|
|
||||||
|
**Rollout**
|
||||||
|
|
||||||
|
* Feature flag:
|
||||||
|
|
||||||
|
* enabled for internal users first
|
||||||
|
* ramp by repo or percentage
|
||||||
|
* Monitor key metrics during ramp:
|
||||||
|
|
||||||
|
* TTFS P95
|
||||||
|
* API error rate
|
||||||
|
* UI error rate
|
||||||
|
* cache miss spikes
|
||||||
|
|
||||||
|
### Acceptance criteria
|
||||||
|
|
||||||
|
* No increase in overall error rates.
|
||||||
|
* TTFS improves at least X% for a meaningful slice of users (define X from baseline).
|
||||||
|
* Fallback UX remains usable when signals are unavailable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Backlog examples (ready-to-create Jira tickets)
|
||||||
|
|
||||||
|
## Epic: Run summary and first signal storage
|
||||||
|
|
||||||
|
* Create `ci_first_signal` table
|
||||||
|
* Create `ci_run_summary` table
|
||||||
|
* Implement ETag hashing
|
||||||
|
* Implement Redis caching layer
|
||||||
|
* Add admin/debug endpoint (internal only) to inspect computed signals
|
||||||
|
|
||||||
|
## Epic: Log chunk extraction
|
||||||
|
|
||||||
|
* Implement incremental log scanning
|
||||||
|
* Store first error excerpt + range
|
||||||
|
* Map excerpt to job + step
|
||||||
|
* Add redaction pass to excerpts
|
||||||
|
|
||||||
|
## Epic: Run details progressive UI
|
||||||
|
|
||||||
|
* FirstSignalCard UI component
|
||||||
|
* Lazy-load logs viewer
|
||||||
|
* Default to opening failing stage
|
||||||
|
* Prefetch signals in runs list
|
||||||
|
|
||||||
|
## Epic: Telemetry and dashboards
|
||||||
|
|
||||||
|
* Add `ttfs_start` and `ttfs_signal_rendered`
|
||||||
|
* Add endpoint latency metrics
|
||||||
|
* Build dashboards + alerts
|
||||||
|
* Add sampling for “signal helpfulness” feedback
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Risk register and mitigations
|
||||||
|
|
||||||
|
## Risk: First signal is wrong/misleading
|
||||||
|
|
||||||
|
* Mitigation:
|
||||||
|
|
||||||
|
* track “generic signal rate” and “corrected by user” feedback
|
||||||
|
* classification confidence scoring
|
||||||
|
* always provide quick access to full logs as fallback
|
||||||
|
|
||||||
|
## Risk: Logs contain secrets
|
||||||
|
|
||||||
|
* Mitigation:
|
||||||
|
|
||||||
|
* redact excerpts before storing/serving
|
||||||
|
* cap excerpt lines and length
|
||||||
|
* keep raw logs behind existing permissions
|
||||||
|
|
||||||
|
## Risk: Increased ingest CPU cost
|
||||||
|
|
||||||
|
* Mitigation:
|
||||||
|
|
||||||
|
* incremental scanning with early stop after first error captured
|
||||||
|
* limit scanning per chunk
|
||||||
|
* sample/skip overly large logs until job completion
|
||||||
|
|
||||||
|
## Risk: Cache invalidation bugs
|
||||||
|
|
||||||
|
* Mitigation:
|
||||||
|
|
||||||
|
* ETag-based correctness
|
||||||
|
* versioned keys: `:v1`
|
||||||
|
* “write-through” cache updates from summarizer
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Definition of Done checklist
|
||||||
|
|
||||||
|
A phase is “done” when:
|
||||||
|
|
||||||
|
* ✅ TTFS measured with reliable client events
|
||||||
|
* ✅ FirstSignalCard renders from `/first-signal` endpoint
|
||||||
|
* ✅ ETag caching works
|
||||||
|
* ✅ Fallback path is solid (no blank screens)
|
||||||
|
* ✅ Dashboards exist and are actively watched during rollout
|
||||||
|
* ✅ Security review completed for log excerpts/redaction
|
||||||
|
* ✅ Load tests show no unacceptable backend regressions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Optional enhancements after initial launch
|
||||||
|
|
||||||
|
1. **Next-step recommendations**
|
||||||
|
Add action suggestions and deep links (rotate token, open failing test, open doc).
|
||||||
|
2. **Flaky test / known issue detection**
|
||||||
|
Show “this matches known flaky signature” with last-seen frequency.
|
||||||
|
3. **“Compare to last green”**
|
||||||
|
Summarize what changed since last successful run (commit diff, dependency bump).
|
||||||
|
4. **SSE live updates**
|
||||||
|
Update first signal as soon as failure occurs while user watches.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure you’d implement.
|
||||||
@@ -0,0 +1,643 @@
|
|||||||
|
Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
|
||||||
|
|
||||||
|
# Replay Fidelity (what, why, how)
|
||||||
|
|
||||||
|
**What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
|
||||||
|
|
||||||
|
**Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The metric
|
||||||
|
|
||||||
|
* **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`.
|
||||||
|
* **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`.
|
||||||
|
* **Breakdown:** also track by scanner, language, image base, feed version, and environment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What must be captured in the scan manifest
|
||||||
|
|
||||||
|
* Exact source refs (image digest / repo SHA), container layers’ digests
|
||||||
|
* Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
|
||||||
|
* Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles
|
||||||
|
* Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
|
||||||
|
* Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pass/Fail rules you can ship
|
||||||
|
|
||||||
|
* **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
|
||||||
|
* **Warn:** Any bucket drops by ≥ 2% week‑over‑week
|
||||||
|
* **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Minimal replay harness (outline)
|
||||||
|
|
||||||
|
1. Pick N historical scans (e.g., last 200 or stratified by image language).
|
||||||
|
2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds).
|
||||||
|
3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
|
||||||
|
4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
|
||||||
|
5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dashboard (what to show)
|
||||||
|
|
||||||
|
* Fidelity % (30/90‑day) + sparkline
|
||||||
|
* Top offenders (by language/scanner/policy set)
|
||||||
|
* “Cause of mismatch” histogram (feed vs runtime vs code vs policy)
|
||||||
|
* Click‑through: deterministic diff (e.g., which CVEs flipped and why)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick wins for Stella Ops
|
||||||
|
|
||||||
|
* Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan.
|
||||||
|
* Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network).
|
||||||
|
* Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
|
||||||
|
* Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays.
|
||||||
|
* Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`.
|
||||||
|
|
||||||
|
Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres?
|
||||||
|
Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1) Extend the concept: Replay Fidelity as a product capability
|
||||||
|
|
||||||
|
### 1.1 Fidelity levels (so you can be strict without being brittle)
|
||||||
|
|
||||||
|
Instead of a single yes/no, define **tiers** that you can report and gate on:
|
||||||
|
|
||||||
|
1. **Bitwise Fidelity (BF)**
|
||||||
|
|
||||||
|
* *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization.
|
||||||
|
* *Use:* strongest auditability, catch ordering/nondeterminism.
|
||||||
|
|
||||||
|
2. **Semantic Fidelity (SF)**
|
||||||
|
|
||||||
|
* *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps).
|
||||||
|
* *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
|
||||||
|
* *Use:* protects you from “cosmetic diffs” and helps triage.
|
||||||
|
|
||||||
|
3. **Policy Fidelity (PF)**
|
||||||
|
|
||||||
|
* *Definition:* Final policy decision (pass/fail + reason codes) matches.
|
||||||
|
* *Use:* useful when outputs may evolve but governance outcome must remain stable.
|
||||||
|
|
||||||
|
**Recommended reporting:**
|
||||||
|
|
||||||
|
* Dashboard shows BF, SF, PF together.
|
||||||
|
* Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.2 “Why did it drift?”—Mismatch classification taxonomy
|
||||||
|
|
||||||
|
When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.
|
||||||
|
|
||||||
|
**Primary mismatch classes**
|
||||||
|
|
||||||
|
* **Feed drift:** CVE/OVAL/vendor advisory snapshot differs.
|
||||||
|
* **Policy drift:** policy/lattice/rules differ (or default rule set changed).
|
||||||
|
* **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences.
|
||||||
|
* **Scanner drift:** scanner binary build differs or dependency versions changed.
|
||||||
|
* **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic.
|
||||||
|
* **External IO:** network calls, “latest” resolution, remote package registry changes.
|
||||||
|
|
||||||
|
**Output:** a `mismatch_reason` plus a short `diff_summary`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.3 Deterministic “scan envelope” design
|
||||||
|
|
||||||
|
A replay only works if the scan is fully specified.
|
||||||
|
|
||||||
|
**Scan envelope components**
|
||||||
|
|
||||||
|
* **Inputs:** image digest, repo commit SHA, build provenance, layers digests.
|
||||||
|
* **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles.
|
||||||
|
* **Feeds:** content-addressed feed bundle digests (see §2.3).
|
||||||
|
* **Policy/rules:** git commit SHA + content digest of compiled rules.
|
||||||
|
* **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count.
|
||||||
|
* **Normalization:** “canonicalization version” for SBOM/VEX/findings.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.4 Canonicalization so “bitwise” is meaningful
|
||||||
|
|
||||||
|
To make BF achievable:
|
||||||
|
|
||||||
|
* Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
|
||||||
|
* Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
|
||||||
|
* Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)`
|
||||||
|
* Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.5 Sampling strategy
|
||||||
|
|
||||||
|
You don’t need to replay everything.
|
||||||
|
|
||||||
|
**Nightly sample:** stratified by:
|
||||||
|
|
||||||
|
* language ecosystem (npm, pip, maven, go, rust…)
|
||||||
|
* scanner engine
|
||||||
|
* base OS
|
||||||
|
* “regulatory tier”
|
||||||
|
* image size/complexity
|
||||||
|
|
||||||
|
**Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2) Technical architecture blueprint
|
||||||
|
|
||||||
|
### 2.1 System components
|
||||||
|
|
||||||
|
1. **Manifest Writer (in the scan pipeline)**
|
||||||
|
|
||||||
|
* Produces `ScanManifest v1` JSON
|
||||||
|
* Records all digests and versions
|
||||||
|
|
||||||
|
2. **Artifact Store**
|
||||||
|
|
||||||
|
* Stores SBOM, findings, VEX, evidence blobs
|
||||||
|
* Stores canonical hashes for BF checks
|
||||||
|
|
||||||
|
3. **Feed Snapshotter**
|
||||||
|
|
||||||
|
* Periodically builds immutable feed bundles
|
||||||
|
* Content-addressed (digest-keyed)
|
||||||
|
* Stores metadata (source URLs, generation timestamp, signature)
|
||||||
|
|
||||||
|
4. **Replay Orchestrator**
|
||||||
|
|
||||||
|
* Chooses historical scans to replay
|
||||||
|
* Launches “replay executor” jobs
|
||||||
|
|
||||||
|
5. **Replay Executor**
|
||||||
|
|
||||||
|
* Runs scanner in pinned container image
|
||||||
|
* Network off, tz fixed, clock policy applied
|
||||||
|
* Produces new artifacts + hashes
|
||||||
|
|
||||||
|
6. **Diff & Scoring Engine**
|
||||||
|
|
||||||
|
* Computes BF/SF/PF
|
||||||
|
* Generates mismatch classification + diff summary
|
||||||
|
|
||||||
|
7. **Metrics + UI Dashboard**
|
||||||
|
|
||||||
|
* Prometheus metrics
|
||||||
|
* UI for drill-down diffs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2.2 Data model (Postgres-friendly)
|
||||||
|
|
||||||
|
**Core tables**
|
||||||
|
|
||||||
|
* `scan_manifests`
|
||||||
|
|
||||||
|
* `scan_id (pk)`
|
||||||
|
* `manifest_json`
|
||||||
|
* `manifest_sha256`
|
||||||
|
* `created_at`
|
||||||
|
* `scan_artifacts`
|
||||||
|
|
||||||
|
* `scan_id (fk)`
|
||||||
|
* `artifact_type` (sbom|findings|vex|evidence)
|
||||||
|
* `artifact_uri`
|
||||||
|
* `canonical_sha256`
|
||||||
|
* `schema_version`
|
||||||
|
* `feed_snapshots`
|
||||||
|
|
||||||
|
* `feed_digest (pk)`
|
||||||
|
* `bundle_uri`
|
||||||
|
* `sources_json`
|
||||||
|
* `generated_at`
|
||||||
|
* `signature`
|
||||||
|
* `replay_runs`
|
||||||
|
|
||||||
|
* `replay_id (pk)`
|
||||||
|
* `original_scan_id (fk)`
|
||||||
|
* `status` (queued|running|passed|failed)
|
||||||
|
* `bf_match bool`, `sf_match bool`, `pf_match bool`
|
||||||
|
* `mismatch_reason`
|
||||||
|
* `diff_summary_json`
|
||||||
|
* `started_at`, `finished_at`
|
||||||
|
* `executor_env_json` (arch, tz, cpu, image digest)
|
||||||
|
|
||||||
|
**Indexes**
|
||||||
|
|
||||||
|
* `(created_at)` for sampling windows
|
||||||
|
* `(mismatch_reason, finished_at)` for triage
|
||||||
|
* `(scanner_version, ecosystem)` for breakdown dashboards
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2.3 Feed Snapshotting (the key to long-term replay)
|
||||||
|
|
||||||
|
**Feed bundle format**
|
||||||
|
|
||||||
|
* `feeds/<source>/<date>/...` inside a tar.zst
|
||||||
|
* manifest file inside bundle: `feed_bundle_manifest.json` containing:
|
||||||
|
|
||||||
|
* source URLs
|
||||||
|
* retrieval commit/etag (if any)
|
||||||
|
* file hashes
|
||||||
|
* generated_by version
|
||||||
|
|
||||||
|
**Content addressing**
|
||||||
|
|
||||||
|
* Digest of the entire bundle (`sha256(tar.zst)`) is the reference.
|
||||||
|
* Scans record only the digest + URI.
|
||||||
|
|
||||||
|
**Immutability**
|
||||||
|
|
||||||
|
* Store bundles in object storage with WORM / retention if you need compliance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2.4 Replay execution sandbox
|
||||||
|
|
||||||
|
For determinism, enforce:
|
||||||
|
|
||||||
|
* **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags)
|
||||||
|
* **Fixed TZ/locale**
|
||||||
|
* **Pinned container image digest**
|
||||||
|
* **Clock policy**
|
||||||
|
|
||||||
|
* Either “real time but recorded” or “frozen time at original scan timestamp”
|
||||||
|
* If scanner logic uses current date for severity windows, freeze time
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3) Development implementation plan
|
||||||
|
|
||||||
|
I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size.
|
||||||
|
|
||||||
|
### Workstream A — Scan Manifest & Canonical Artifacts
|
||||||
|
|
||||||
|
**Goal:** every scan is replayable on paper, even before replays run.
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* `ScanManifest v1` schema + writer integrated into scan pipeline
|
||||||
|
* Canonicalization library + canonical hashing for all artifacts
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
|
||||||
|
* Artifact hashes are stable across repeated runs in the same environment
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream B — Feed Snapshotting & Policy Versioning
|
||||||
|
|
||||||
|
**Goal:** eliminate “feed drift” by pinning immutable inputs.
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Feed bundle builder + signer + uploader
|
||||||
|
* Policy/rules bundler (compiled rules bundle, digest recorded)
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* New scans reference feed bundle digests (not “latest”)
|
||||||
|
* A scan can be re-run with the same feed bundle and policy bundle
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream C — Replay Runner & Diff Engine
|
||||||
|
|
||||||
|
**Goal:** execute historical scans and score BF/SF/PF with actionable diffs.
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* `stella replay --from manifest.json`
|
||||||
|
* Orchestrator job to schedule replays
|
||||||
|
* Diff engine + mismatch classifier
|
||||||
|
* Storage of replay results
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Replay produces deterministic artifacts in a pinned environment
|
||||||
|
* Dashboard/CLI shows BF/SF/PF + diff summary for failures
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream D — Observability, Dashboard, and CI Gates
|
||||||
|
|
||||||
|
**Goal:** make fidelity visible and enforceable.
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf`
|
||||||
|
* Breakdown labels (scanner, ecosystem, policy_set, base_os)
|
||||||
|
* Alerts for drop thresholds
|
||||||
|
* CI gate option: “block release if BF < threshold on canary set”
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Engineering can see drift within 24h
|
||||||
|
* Releases are blocked when fidelity regressions occur
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4) Suggested sprint plan with concrete tasks
|
||||||
|
|
||||||
|
### Sprint 0 — Design lock + baseline
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Define manifest schema: `ScanManifest v1` fields + versioning rules
|
||||||
|
* Decide canonicalization rules (what is normalized vs preserved)
|
||||||
|
* Choose initial “golden canary” scan set (10–20 representative targets)
|
||||||
|
* Add “replay-fidelity” epic with ownership & SLIs/SLOs
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Approved schema + canonicalization spec
|
||||||
|
* Canary set stored and tagged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 1 — Manifest writer + artifact hashing (MVP)
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Implement manifest writer in scan pipeline
|
||||||
|
* Store `manifest_json` + `manifest_sha256`
|
||||||
|
* Implement canonicalization + hashing for:
|
||||||
|
|
||||||
|
* findings list (sorted)
|
||||||
|
* SBOM (normalized)
|
||||||
|
* VEX (if present)
|
||||||
|
* Persist canonical hashes in `scan_artifacts`
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Two identical scans in the same environment yield identical artifact hashes
|
||||||
|
* A “manifest export” endpoint/CLI works:
|
||||||
|
|
||||||
|
* `stella scan --emit-manifest out.json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 2 — Feed snapshotter + policy bundling
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Build feed bundler job:
|
||||||
|
|
||||||
|
* pull raw sources
|
||||||
|
* normalize layout
|
||||||
|
* generate `feed_bundle_manifest.json`
|
||||||
|
* tar.zst + sha256
|
||||||
|
* upload + record in `feed_snapshots`
|
||||||
|
* Update scan pipeline:
|
||||||
|
|
||||||
|
* resolve feed bundle digest at scan start
|
||||||
|
* record digest in scan manifest
|
||||||
|
* Bundle policy/lattice:
|
||||||
|
|
||||||
|
* compile rules into an immutable artifact
|
||||||
|
* record policy bundle digest in manifest
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Scans reference immutable feed + policy digests
|
||||||
|
* You can fetch feed bundle by digest and reproduce the same feed inputs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 3 — Replay executor + “no network” sandbox
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Create replay container image / runtime wrapper
|
||||||
|
* Implement `stella replay --from MANIFEST.json`
|
||||||
|
|
||||||
|
* pulls scanner image by digest
|
||||||
|
* mounts feed bundle + policy bundle
|
||||||
|
* runs in network-off mode
|
||||||
|
* applies tz/locale + clock mode
|
||||||
|
* Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage)
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Replay runs end-to-end for canary scans
|
||||||
|
* Deterministic runtime controls verified (no DNS egress, fixed tz)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 4 — Diff engine + mismatch classification
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Implement BF compare (canonical hashes)
|
||||||
|
* Implement SF compare (semantic JSON/object comparison)
|
||||||
|
* Implement PF compare (policy decision equivalence)
|
||||||
|
* Implement mismatch classification rules:
|
||||||
|
|
||||||
|
* if feed digest differs → feed drift
|
||||||
|
* if scanner digest differs → scanner drift
|
||||||
|
* if environment differs → runtime drift
|
||||||
|
* else → nondeterminism (with sub-tags for ordering/time/RNG)
|
||||||
|
* Generate `diff_summary_json`:
|
||||||
|
|
||||||
|
* top N changed CVEs
|
||||||
|
* packages added/removed
|
||||||
|
* policy verdict changes
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
|
||||||
|
* Engineers can reproduce failures locally with the manifest
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 5 — Dashboard + alerts + CI gate
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Expose Prometheus metrics from replay service
|
||||||
|
* Build dashboard:
|
||||||
|
|
||||||
|
* BF/SF/PF trends
|
||||||
|
* breakdown by ecosystem/scanner/policy
|
||||||
|
* mismatch cause histogram
|
||||||
|
* Add alerting rules (drop threshold, bucket regression)
|
||||||
|
* Add CI gate mode:
|
||||||
|
|
||||||
|
* “run replays on canary set for this release candidate”
|
||||||
|
* block merge if BF < target
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Fidelity visible to leadership and engineering
|
||||||
|
* Release process is protected by canary replays
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Sprint 6 — Hardening + compliance polish
|
||||||
|
|
||||||
|
**Tasks**
|
||||||
|
|
||||||
|
* Backward compatible manifest upgrades:
|
||||||
|
|
||||||
|
* `manifest_version` bump rules
|
||||||
|
* migration support
|
||||||
|
* Artifact signing / integrity:
|
||||||
|
|
||||||
|
* sign manifest hash
|
||||||
|
* optional transparency log later
|
||||||
|
* Storage & retention policies (cost controls)
|
||||||
|
* Runbook + oncall playbook
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Audit story is complete: “show me exactly how scan X was produced”
|
||||||
|
* Operational load is manageable and cost-bounded
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5) Engineering specs you can start implementing immediately
|
||||||
|
|
||||||
|
### 5.1 `ScanManifest v1` skeleton (example)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"manifest_version": "1.0",
|
||||||
|
"scan_id": "scan_123",
|
||||||
|
"created_at": "2025-12-12T10:15:30Z",
|
||||||
|
|
||||||
|
"input": {
|
||||||
|
"type": "oci_image",
|
||||||
|
"image_ref": "registry/app@sha256:...",
|
||||||
|
"layers": ["sha256:...", "sha256:..."],
|
||||||
|
"source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
|
||||||
|
},
|
||||||
|
|
||||||
|
"scanner": {
|
||||||
|
"engine": "stella",
|
||||||
|
"scanner_image_digest": "sha256:...",
|
||||||
|
"scanner_version": "2025.12.0",
|
||||||
|
"config_digest": "sha256:...",
|
||||||
|
"flags": ["--deep", "--vex"]
|
||||||
|
},
|
||||||
|
|
||||||
|
"feeds": {
|
||||||
|
"vuln_feed_bundle_digest": "sha256:...",
|
||||||
|
"license_db_digest": "sha256:..."
|
||||||
|
},
|
||||||
|
|
||||||
|
"policy": {
|
||||||
|
"policy_bundle_digest": "sha256:...",
|
||||||
|
"policy_set": "prod-default"
|
||||||
|
},
|
||||||
|
|
||||||
|
"environment": {
|
||||||
|
"arch": "amd64",
|
||||||
|
"os": "linux",
|
||||||
|
"tz": "UTC",
|
||||||
|
"locale": "C",
|
||||||
|
"network": "disabled",
|
||||||
|
"clock_mode": "frozen",
|
||||||
|
"clock_value": "2025-12-12T10:15:30Z"
|
||||||
|
},
|
||||||
|
|
||||||
|
"normalization": {
|
||||||
|
"canonicalizer_version": "1.2.0",
|
||||||
|
"sbom_schema": "cyclonedx-1.6",
|
||||||
|
"vex_schema": "cyclonedx-vex-1.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5.2 CLI spec (minimal)
|
||||||
|
|
||||||
|
* `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/`
|
||||||
|
* `stella replay --from MANIFEST.json --out-dir replay_out/`
|
||||||
|
* `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6) Testing strategy (to prevent determinism regressions)
|
||||||
|
|
||||||
|
### Unit tests
|
||||||
|
|
||||||
|
* Canonicalization: same object → same bytes
|
||||||
|
* Sorting stability: randomized input order → stable output
|
||||||
|
* Hash determinism
|
||||||
|
|
||||||
|
### Integration tests
|
||||||
|
|
||||||
|
* Golden canaries:
|
||||||
|
|
||||||
|
* run scan twice in same runner → BF match
|
||||||
|
* replay from manifest → BF match
|
||||||
|
* “Network leak” test:
|
||||||
|
|
||||||
|
* DNS requests must be zero
|
||||||
|
* “Clock leak” test:
|
||||||
|
|
||||||
|
* freeze time; ensure outputs do not include real timestamps
|
||||||
|
|
||||||
|
### Chaos tests
|
||||||
|
|
||||||
|
* Vary CPU count, run concurrency, run order → still BF match
|
||||||
|
* Randomized scheduling / thread interleavings to find races
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7) Operational policies (so it stays useful)
|
||||||
|
|
||||||
|
### Retention & cost controls
|
||||||
|
|
||||||
|
* Keep full artifacts for regulated scans (e.g., 1–7 years)
|
||||||
|
* For non-regulated:
|
||||||
|
|
||||||
|
* keep manifests + canonical hashes long-term
|
||||||
|
* expire heavy evidence blobs after N days
|
||||||
|
* Compress large artifacts and dedupe by digest
|
||||||
|
|
||||||
|
### Alerting examples
|
||||||
|
|
||||||
|
* BF drops by ≥2% week-over-week (any major bucket) → warn
|
||||||
|
* BF < 0.90 overall or regulated BF < 0.95 → page / block release
|
||||||
|
|
||||||
|
### Triage workflow
|
||||||
|
|
||||||
|
* Failed replay auto-creates a ticket with:
|
||||||
|
|
||||||
|
* manifest link
|
||||||
|
* mismatch_reason
|
||||||
|
* diff_summary
|
||||||
|
* reproduction command
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8) What “done” looks like (definition of success)
|
||||||
|
|
||||||
|
* Any customer/auditor can pick a scan from 6 months ago and you can:
|
||||||
|
|
||||||
|
1. retrieve manifest + feed bundle + policy bundle by digest
|
||||||
|
2. replay in a pinned sandbox
|
||||||
|
3. show BF/SF/PF results and diffs
|
||||||
|
* Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
If you want, I can also provide:
|
||||||
|
|
||||||
|
* a **Postgres DDL** for the tables above,
|
||||||
|
* a **Prometheus metrics contract** (names + labels + example queries),
|
||||||
|
* and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts.
|
||||||
@@ -0,0 +1,840 @@
|
|||||||
|
Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Why this matters (background)
|
||||||
|
|
||||||
|
* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
|
||||||
|
* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
|
||||||
|
* **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.”
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Minimal architecture (fits Stella Ops)
|
||||||
|
|
||||||
|
1. **Entrypoint map** (per service): controllers, handlers, consumers.
|
||||||
|
2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
|
||||||
|
3. **Reachability cache** keyed by (commit, entrypoint, package@version).
|
||||||
|
4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`.
|
||||||
|
|
||||||
|
* If a path to a sensitive sink is newly reachable → **High**.
|
||||||
|
* If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Tiny working seeds
|
||||||
|
|
||||||
|
**C# (.NET 10) — Roslyn skeleton to diff call‑reachability**
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
// SmartDiff.csproj targets net10.0
|
||||||
|
using Microsoft.CodeAnalysis;
|
||||||
|
using Microsoft.CodeAnalysis.CSharp;
|
||||||
|
using Microsoft.CodeAnalysis.FindSymbols;
|
||||||
|
|
||||||
|
public static class SmartDiff
|
||||||
|
{
|
||||||
|
public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
|
||||||
|
{
|
||||||
|
var workspace = MSBuild.MSBuildWorkspace.Create();
|
||||||
|
var solution = await workspace.OpenSolutionAsync(solutionPath);
|
||||||
|
var index = new HashSet<string>();
|
||||||
|
|
||||||
|
foreach (var proj in solution.Projects)
|
||||||
|
{
|
||||||
|
var comp = await proj.GetCompilationAsync();
|
||||||
|
if (comp is null) continue;
|
||||||
|
|
||||||
|
// Resolve entrypoints & sinks by symbol name
|
||||||
|
var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
|
||||||
|
.OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
|
||||||
|
var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
|
||||||
|
.OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();
|
||||||
|
|
||||||
|
foreach (var ep in epSymbols)
|
||||||
|
foreach (var sink in sinkSymbols)
|
||||||
|
{
|
||||||
|
// Heuristic reachability: cheap path search via SymbolFinder
|
||||||
|
var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
|
||||||
|
if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
|
||||||
|
index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return index;
|
||||||
|
|
||||||
|
static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
|
||||||
|
{
|
||||||
|
foreach (var m in sym.GetMembers())
|
||||||
|
{
|
||||||
|
yield return m;
|
||||||
|
if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Go — SSA & callgraph seed**
|
||||||
|
|
||||||
|
```go
|
||||||
|
// go.mod: require golang.org/x/tools latest
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"golang.org/x/tools/go/callgraph/cha"
|
||||||
|
"golang.org/x/tools/go/packages"
|
||||||
|
"golang.org/x/tools/go/ssa"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
|
||||||
|
pkgs, _ := packages.Load(cfg, "./...")
|
||||||
|
prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
|
||||||
|
for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
|
||||||
|
prog.Build()
|
||||||
|
|
||||||
|
cg := cha.CallGraph(prog)
|
||||||
|
// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
|
||||||
|
fmt.Println("nodes:", len(cg.Nodes))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### How to use it in your pipeline (fast win)
|
||||||
|
|
||||||
|
* **Pre‑merge job**:
|
||||||
|
|
||||||
|
1. Build call graph for `HEAD` and `HEAD^`.
|
||||||
|
2. Compute Smart‑Diff.
|
||||||
|
3. If any *new* EP→sink path appears, fail with a short, proof‑linked note:
|
||||||
|
“New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
|
||||||
|
* **Post‑scan VEX**:
|
||||||
|
|
||||||
|
* For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Evidence to show in the UI
|
||||||
|
|
||||||
|
* “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash.
|
||||||
|
* “**What changed**”: before/after path diff (green removed, red added).
|
||||||
|
* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Developer checklist (Stella Ops style)
|
||||||
|
|
||||||
|
* [ ] Define entrypoints per service (attribute or YAML).
|
||||||
|
* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
|
||||||
|
* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
|
||||||
|
* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
|
||||||
|
* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
|
||||||
|
* [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear.
|
||||||
|
|
||||||
|
If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
|
||||||
|
Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1) Product definition
|
||||||
|
|
||||||
|
### Problem you’re solving
|
||||||
|
|
||||||
|
Security noise comes from:
|
||||||
|
|
||||||
|
* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
|
||||||
|
* Git diffs look big even when behavior is unchanged
|
||||||
|
* Teams struggle to triage “is this change actually risky?”
|
||||||
|
|
||||||
|
### What Smart‑Diff should do (core behavior)
|
||||||
|
|
||||||
|
Given **base commit A** and **head commit B**:
|
||||||
|
|
||||||
|
1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
|
||||||
|
2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
|
||||||
|
3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
|
||||||
|
4. Emit **Smart‑Diff**:
|
||||||
|
|
||||||
|
* **Newly reachable** EP→sink paths (risk ↑)
|
||||||
|
* **Removed** EP→sink paths (risk ↓)
|
||||||
|
* **Changed** paths (same sink but different sanitization/guards)
|
||||||
|
5. Attach **dependency vulnerability context**:
|
||||||
|
|
||||||
|
* If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
|
||||||
|
* Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence
|
||||||
|
|
||||||
|
### MVP definition (minimum shippable)
|
||||||
|
|
||||||
|
A PR check that:
|
||||||
|
|
||||||
|
* Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
|
||||||
|
* Produces:
|
||||||
|
|
||||||
|
* SARIF report (for code scanning UI)
|
||||||
|
* JSON artifact containing proof paths (EP → … → sink with file:line)
|
||||||
|
* Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2) Architecture you can actually build
|
||||||
|
|
||||||
|
### High‑level components
|
||||||
|
|
||||||
|
1. **Policy & Taxonomy Service**
|
||||||
|
|
||||||
|
* Defines entrypoints, sources, sinks, sanitizers, confidence rules
|
||||||
|
* Versioned and centrally managed (but supports repo overrides)
|
||||||
|
|
||||||
|
2. **Analyzer Workers (language adapters)**
|
||||||
|
|
||||||
|
* .NET analyzer (Roslyn + control flow)
|
||||||
|
* Go analyzer (SSA + callgraph)
|
||||||
|
* Outputs standardized IR (Intermediate Representation)
|
||||||
|
|
||||||
|
3. **Graph Store + Reachability Engine**
|
||||||
|
|
||||||
|
* Stores symbol nodes + call edges + dataflow edges
|
||||||
|
* Computes reachable sinks per entrypoint
|
||||||
|
* Computes diff between commits A and B
|
||||||
|
|
||||||
|
4. **Vulnerability Mapper + VEX Generator**
|
||||||
|
|
||||||
|
* Maps vulnerable packages/functions → “surfaces”
|
||||||
|
* Joins with reachability results
|
||||||
|
* Emits OpenVEX (or CycloneDX VEX) with evidence links
|
||||||
|
|
||||||
|
5. **CI/PR Integrations**
|
||||||
|
|
||||||
|
* CLI that runs in CI
|
||||||
|
* Optional server mode (cache + incremental processing)
|
||||||
|
|
||||||
|
6. **UI/API**
|
||||||
|
|
||||||
|
* Path cards: “what changed”, “why it matters”, “proof”
|
||||||
|
* Filters by sink class, confidence, service, entrypoint
|
||||||
|
|
||||||
|
### Data contracts (standardized IR)
|
||||||
|
|
||||||
|
Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic:
|
||||||
|
|
||||||
|
* **Symbols**
|
||||||
|
|
||||||
|
* `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
|
||||||
|
* metadata: file, line ranges, kind (method/function), accessibility
|
||||||
|
|
||||||
|
* **Edges**
|
||||||
|
|
||||||
|
* Call edge: `caller_symbol_id -> callee_symbol_id`
|
||||||
|
* Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
|
||||||
|
* Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)
|
||||||
|
|
||||||
|
* **Entrypoints / Sources / Sinks**
|
||||||
|
|
||||||
|
* entrypoint: (symbol_id, route/topic/command metadata)
|
||||||
|
* sink: (symbol_id, sink_type, severity, cwe mapping optional)
|
||||||
|
|
||||||
|
* **Paths**
|
||||||
|
|
||||||
|
* `entrypoint -> ... -> sink`
|
||||||
|
* hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3) Workstreams and deliverables
|
||||||
|
|
||||||
|
### Workstream A — Policy, taxonomy, configuration
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* `smartdiff.policy.yaml` schema and validator
|
||||||
|
* A default sink taxonomy:
|
||||||
|
|
||||||
|
* `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
|
||||||
|
* Initial sanitizer patterns:
|
||||||
|
|
||||||
|
* For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists
|
||||||
|
|
||||||
|
**Implementation notes**
|
||||||
|
|
||||||
|
* Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers.
|
||||||
|
* Provide repo-level overrides:
|
||||||
|
|
||||||
|
* `smartdiff.policy.yaml` in repo root
|
||||||
|
* Central policies referenced by version tag
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* A service can onboard by configuring:
|
||||||
|
|
||||||
|
* entrypoint discovery mode (auto + manual)
|
||||||
|
* sink classes to enforce
|
||||||
|
* severity threshold to fail PR
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream B — .NET analyzer (Roslyn)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Build pipeline that produces:
|
||||||
|
|
||||||
|
* call graph (methods and invocations)
|
||||||
|
* basic control-flow guards for reachability (optional for MVP)
|
||||||
|
* taint propagation for common patterns (MVP: parameter → sink)
|
||||||
|
* Entry point discovery for:
|
||||||
|
|
||||||
|
* ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
|
||||||
|
* Minimal APIs (`MapGet/MapPost`)
|
||||||
|
* gRPC service methods
|
||||||
|
* message consumers (configurable attributes/interfaces)
|
||||||
|
|
||||||
|
**Implementation notes (practical path)**
|
||||||
|
|
||||||
|
* MVP static callgraph:
|
||||||
|
|
||||||
|
* Use Roslyn semantic model to resolve invocation targets
|
||||||
|
* For virtual/interface calls: conservative resolution to possible implementations within the compilation
|
||||||
|
* MVP taint:
|
||||||
|
|
||||||
|
* “Sources”: request params/body, headers, query string, message payloads
|
||||||
|
* “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
|
||||||
|
* Propagate taint across:
|
||||||
|
|
||||||
|
* parameter → local → argument
|
||||||
|
* return values
|
||||||
|
* simple assignments and concatenations (heuristic)
|
||||||
|
* Confidence scoring:
|
||||||
|
|
||||||
|
* Direct static call resolution: high
|
||||||
|
* Reflection/dynamic: low (flag separately)
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* On a demo ASP.NET service, if a PR adds:
|
||||||
|
|
||||||
|
* `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)`
|
||||||
|
Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream C — Go analyzer (SSA)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* SSA build + callgraph extraction
|
||||||
|
* Entrypoint discovery for:
|
||||||
|
|
||||||
|
* `net/http` handlers
|
||||||
|
* common routers (Gin/Echo/Chi) via adapter rules
|
||||||
|
* gRPC methods
|
||||||
|
* consumers (Kafka/NATS/etc.) by config
|
||||||
|
|
||||||
|
**Implementation notes**
|
||||||
|
|
||||||
|
* Use `golang.org/x/tools/go/packages` + `ssa` build
|
||||||
|
* Callgraph:
|
||||||
|
|
||||||
|
* start with CHA (Class Hierarchy Analysis) for speed
|
||||||
|
* later add pointer analysis for precision on interfaces
|
||||||
|
* Taint:
|
||||||
|
|
||||||
|
* sources: `http.Request`, router params, message payloads
|
||||||
|
* sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream D — Graph store + reachability computation
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Schema in Postgres (recommended first) for:
|
||||||
|
|
||||||
|
* commits, services, languages
|
||||||
|
* symbols, edges, entrypoints, sinks
|
||||||
|
* computed reachable “facts” (entrypoint→sink with shortest path(s))
|
||||||
|
* Reachability engine:
|
||||||
|
|
||||||
|
* BFS/DFS per entrypoint with early cutoffs
|
||||||
|
* path reconstruction storage (store predecessor map or store k-shortest paths)
|
||||||
|
|
||||||
|
**Implementation notes**
|
||||||
|
|
||||||
|
* Don’t start with a graph DB unless you must.
|
||||||
|
* Use Postgres tables + indexes:
|
||||||
|
|
||||||
|
* `edges(from_symbol, to_symbol, commit_id, kind)`
|
||||||
|
* `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
|
||||||
|
* `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
|
||||||
|
* Cache:
|
||||||
|
|
||||||
|
* keyed by (commit, policy_version, analyzer_version)
|
||||||
|
* avoids recompute on re-runs
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* For any analyzed commit, you can answer:
|
||||||
|
|
||||||
|
* “Which sinks are reachable from these entrypoints?”
|
||||||
|
* “Show me one proof path per (entrypoint, sink_type).”
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream E — Smart‑Diff engine (the “diff” part)
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Diff algorithm producing three buckets:
|
||||||
|
|
||||||
|
* `added_paths`, `removed_paths`, `changed_paths`
|
||||||
|
* “Changed” means:
|
||||||
|
|
||||||
|
* same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes
|
||||||
|
|
||||||
|
**Implementation notes**
|
||||||
|
|
||||||
|
* Identify a path by a stable fingerprint:
|
||||||
|
|
||||||
|
* `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
|
||||||
|
* Store:
|
||||||
|
|
||||||
|
* top-k paths for each pair for evidence (k=1 for MVP, add more later)
|
||||||
|
* Severity gating rules:
|
||||||
|
|
||||||
|
* Example:
|
||||||
|
|
||||||
|
* New path to `CMD_EXEC` = fail
|
||||||
|
* New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
|
||||||
|
* New path to `SQL_RAW` = fail unless parameterized sanitizer present
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Given commits A and B:
|
||||||
|
|
||||||
|
* If B introduces a new reachable sink, CI fails with a single actionable card:
|
||||||
|
|
||||||
|
* **EP**: route / handler
|
||||||
|
* **Sink**: type + symbol
|
||||||
|
* **Proof**: hop list
|
||||||
|
* **Why**: policy rule triggered
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream F — Vulnerability mapping + VEX
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* Ingest dependency inventory (SBOM or lockfiles)
|
||||||
|
* Map vulnerabilities to “surfaces”
|
||||||
|
|
||||||
|
* package → vulnerable module/function patterns
|
||||||
|
* minimal version/range matching (from your existing vuln feed)
|
||||||
|
* Decision logic:
|
||||||
|
|
||||||
|
* **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
|
||||||
|
* else **Not affected / Not exploitable** with justification
|
||||||
|
|
||||||
|
**Implementation notes**
|
||||||
|
|
||||||
|
* Start with a pragmatic approach:
|
||||||
|
|
||||||
|
* package‑level reachability: “is any symbol in that package reachable?”
|
||||||
|
* then iterate toward function‑level surfaces
|
||||||
|
* VEX output:
|
||||||
|
|
||||||
|
* include commit hash, policy version, evidence paths
|
||||||
|
* embed links to internal “path card” URLs if available
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* For a known vulnerable dependency, the system emits:
|
||||||
|
|
||||||
|
* VEX “not affected” if package code is never reached from any entrypoint, with proof references.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream G — CI integration + developer UX
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* A single CLI:
|
||||||
|
|
||||||
|
* `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
|
||||||
|
* `smartdiff diff --base <shaA> --head <shaB> --out sarif`
|
||||||
|
* CI templates for:
|
||||||
|
|
||||||
|
* GitHub Actions / GitLab CI
|
||||||
|
* Outputs:
|
||||||
|
|
||||||
|
* SARIF
|
||||||
|
* JSON evidence bundle
|
||||||
|
* optional OpenVEX file
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* Teams can enable Smart‑Diff by adding:
|
||||||
|
|
||||||
|
* CI job + config file
|
||||||
|
* no additional infra required for MVP (local artifacts mode)
|
||||||
|
* When infra is available, enable server caching mode for speed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Workstream H — UI “Path Cards”
|
||||||
|
|
||||||
|
**Deliverables**
|
||||||
|
|
||||||
|
* UI components:
|
||||||
|
|
||||||
|
* Path card list with filters (sink type, severity, confidence)
|
||||||
|
* “What changed” diff view:
|
||||||
|
|
||||||
|
* red = added hops
|
||||||
|
* green = removed hops
|
||||||
|
* “Evidence” panel:
|
||||||
|
|
||||||
|
* file:line for each hop
|
||||||
|
* code snippets (optional)
|
||||||
|
* APIs:
|
||||||
|
|
||||||
|
* `GET /smartdiff/{repo}/{pr}/findings`
|
||||||
|
* `GET /smartdiff/{repo}/{commit}/path/{path_id}`
|
||||||
|
|
||||||
|
**Acceptance criteria**
|
||||||
|
|
||||||
|
* A developer can click one finding and understand:
|
||||||
|
|
||||||
|
* how the data got there
|
||||||
|
* exactly what line introduced the risk
|
||||||
|
* how to fix (sanitize/guard/allowlist)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4) Milestone plan (sequenced, no time promises)
|
||||||
|
|
||||||
|
### Milestone 0 — Foundation
|
||||||
|
|
||||||
|
* Repo scaffolding:
|
||||||
|
|
||||||
|
* `smartdiff-cli/`
|
||||||
|
* `analyzers/dotnet/`
|
||||||
|
* `analyzers/go/`
|
||||||
|
* `core-ir/` (schemas + validation)
|
||||||
|
* `server/` (optional; can come later)
|
||||||
|
* Define IR JSON schema + versioning rules
|
||||||
|
* Implement policy YAML + validator + sample policies
|
||||||
|
* Implement “local mode” artifact output
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Milestone 1 — Callgraph reachability MVP
|
||||||
|
|
||||||
|
* .NET: build call edges + entrypoint discovery (basic)
|
||||||
|
* Go: build call edges + entrypoint discovery (basic)
|
||||||
|
* Graph store: in-memory or local sqlite/postgres
|
||||||
|
* Compute reachable sinks (callgraph only, no taint)
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* On a demo repo, you can list:
|
||||||
|
|
||||||
|
* entrypoints
|
||||||
|
* reachable sinks (callgraph reachability only)
|
||||||
|
* a proof path (hop list)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Milestone 2 — Smart‑Diff MVP (PR gating)
|
||||||
|
|
||||||
|
* Compute diff between base/head reachable sink sets
|
||||||
|
* Produce SARIF with:
|
||||||
|
|
||||||
|
* rule id = sink type
|
||||||
|
* message includes entrypoint + sink + link to evidence JSON
|
||||||
|
* CI templates + documentation
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* In PR checks, the job fails on new EP→sink paths and links to a proof.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Milestone 3 — Taint/dataflow MVP (high-value sinks only)
|
||||||
|
|
||||||
|
* Add taint propagation to reduce false positives:
|
||||||
|
|
||||||
|
* differentiate “sink reachable” vs “untrusted data reaches sink”
|
||||||
|
* Add sanitizer recognition
|
||||||
|
* Add confidence scoring + suppression mechanisms (policy allowlists)
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Milestone 4 — VEX integration MVP
|
||||||
|
|
||||||
|
* Join reachability with dependency vulnerabilities
|
||||||
|
* Emit OpenVEX (and/or CycloneDX VEX)
|
||||||
|
* Store evidence references (paths) inside VEX justification
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* For a repo with a vulnerable dependency, you can automatically produce:
|
||||||
|
|
||||||
|
* affected/not affected with evidence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Milestone 5 — Scale and precision improvements
|
||||||
|
|
||||||
|
* Incremental analysis (only analyze changed projects/packages)
|
||||||
|
* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
|
||||||
|
* Optional runtime telemetry integration:
|
||||||
|
|
||||||
|
* import production traces to prioritize “actually observed” entrypoints
|
||||||
|
|
||||||
|
**Exit criteria**
|
||||||
|
|
||||||
|
* Works on large services with acceptable run time and stable noise levels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5) Backlog you can paste into Jira (epics + key stories)
|
||||||
|
|
||||||
|
### Epic: Policy & taxonomy
|
||||||
|
|
||||||
|
* Story: Define `smartdiff.policy.yaml` schema and validator
|
||||||
|
**AC:** invalid configs fail with clear errors; configs are versioned.
|
||||||
|
* Story: Provide default sink list and severities
|
||||||
|
**AC:** at least 10 sink rules with test cases.
|
||||||
|
|
||||||
|
### Epic: .NET analyzer
|
||||||
|
|
||||||
|
* Story: Resolve method invocations to symbols (Roslyn)
|
||||||
|
**AC:** correct targets for direct calls; conservative handling for virtual calls.
|
||||||
|
* Story: Discover ASP.NET routes and bind to entrypoint symbols
|
||||||
|
**AC:** entrypoints include route/method metadata.
|
||||||
|
|
||||||
|
### Epic: Go analyzer
|
||||||
|
|
||||||
|
* Story: SSA build and callgraph extraction
|
||||||
|
**AC:** function nodes and edges generated for a multi-package repo.
|
||||||
|
* Story: net/http entrypoint discovery
|
||||||
|
**AC:** handler functions recognized as entrypoints with path labels.
|
||||||
|
|
||||||
|
### Epic: Reachability engine
|
||||||
|
|
||||||
|
* Story: Compute reachable sinks per entrypoint
|
||||||
|
**AC:** store at least one path with hop list.
|
||||||
|
* Story: Smart‑Diff A vs B
|
||||||
|
**AC:** added/removed paths computed deterministically.
|
||||||
|
|
||||||
|
### Epic: CI/SARIF
|
||||||
|
|
||||||
|
* Story: Emit SARIF results
|
||||||
|
**AC:** findings appear in code scanning UI; include file/line.
|
||||||
|
|
||||||
|
### Epic: Taint analysis
|
||||||
|
|
||||||
|
* Story: Propagate taint from request to sink for 3 sink classes
|
||||||
|
**AC:** produces “tainted” evidence with a variable/argument trace.
|
||||||
|
* Story: Sanitizer recognition
|
||||||
|
**AC:** path marked “sanitized” and downgraded per policy.
|
||||||
|
|
||||||
|
### Epic: VEX
|
||||||
|
|
||||||
|
* Story: Generate OpenVEX statements from reachability + vuln feed
|
||||||
|
**AC:** for “not affected” includes justification and evidence references.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6) Key engineering decisions (recommended defaults)
|
||||||
|
|
||||||
|
### Storage
|
||||||
|
|
||||||
|
* Start with **Postgres** (or even local sqlite for MVP) for simplicity.
|
||||||
|
* Introduce a graph DB only if:
|
||||||
|
|
||||||
|
* you need very large multi-commit graph queries at low latency
|
||||||
|
* Postgres performance becomes a hard blocker
|
||||||
|
|
||||||
|
### Confidence model
|
||||||
|
|
||||||
|
Every edge/path should carry:
|
||||||
|
|
||||||
|
* `confidence`: High/Med/Low
|
||||||
|
* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
|
||||||
|
This lets you:
|
||||||
|
* gate only on high-confidence paths in early rollout
|
||||||
|
* keep low-confidence as “informational”
|
||||||
|
|
||||||
|
### Suppression model
|
||||||
|
|
||||||
|
* Local suppressions:
|
||||||
|
|
||||||
|
* `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
|
||||||
|
* Policy allowlists:
|
||||||
|
|
||||||
|
* allow file writes only under certain directories
|
||||||
|
* allow outbound network only to configured domains
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7) Testing strategy (to avoid “cool demo, unusable tool”)
|
||||||
|
|
||||||
|
### Unit tests
|
||||||
|
|
||||||
|
* Symbol hashing stability tests
|
||||||
|
* Call resolution tests:
|
||||||
|
|
||||||
|
* overloads, generics, interfaces, lambdas
|
||||||
|
* Policy parsing/validation tests
|
||||||
|
|
||||||
|
### Integration tests (must-have)
|
||||||
|
|
||||||
|
* Golden repos in `testdata/`:
|
||||||
|
|
||||||
|
* one ASP.NET minimal API
|
||||||
|
* one MVC controller app
|
||||||
|
* one Go net/http + one Gin app
|
||||||
|
* Golden outputs:
|
||||||
|
|
||||||
|
* expected entrypoints
|
||||||
|
* expected reachable sinks
|
||||||
|
* expected diff between commits
|
||||||
|
|
||||||
|
### Regression tests
|
||||||
|
|
||||||
|
* A curated corpus of “known issues”:
|
||||||
|
|
||||||
|
* false positives you fixed should never return
|
||||||
|
* false negatives: ensure known risky path is always found
|
||||||
|
|
||||||
|
### Performance tests
|
||||||
|
|
||||||
|
* Measure:
|
||||||
|
|
||||||
|
* analysis time per 50k LOC
|
||||||
|
* memory peak
|
||||||
|
* graph size
|
||||||
|
* Budget enforcement:
|
||||||
|
|
||||||
|
* if over budget, degrade gracefully (lower precision, mark low confidence)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8) Example configs and outputs (to make onboarding easy)
|
||||||
|
|
||||||
|
### Example policy YAML (minimal)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: 1
|
||||||
|
service: invoices-api
|
||||||
|
entrypoints:
|
||||||
|
autodiscover:
|
||||||
|
dotnet:
|
||||||
|
aspnet: true
|
||||||
|
go:
|
||||||
|
net_http: true
|
||||||
|
|
||||||
|
sinks:
|
||||||
|
- type: CMD_EXEC
|
||||||
|
severity: high
|
||||||
|
match:
|
||||||
|
dotnet:
|
||||||
|
symbols:
|
||||||
|
- "System.Diagnostics.Process.Start(string)"
|
||||||
|
go:
|
||||||
|
symbols:
|
||||||
|
- "os/exec.Command"
|
||||||
|
- type: FILE_WRITE
|
||||||
|
severity: medium
|
||||||
|
match:
|
||||||
|
dotnet:
|
||||||
|
namespaces: ["System.IO"]
|
||||||
|
go:
|
||||||
|
symbols: ["os.WriteFile"]
|
||||||
|
|
||||||
|
gating:
|
||||||
|
fail_on:
|
||||||
|
- sink_type: CMD_EXEC
|
||||||
|
when: "added && confidence >= medium"
|
||||||
|
- sink_type: FILE_WRITE
|
||||||
|
when: "added && tainted && confidence >= medium"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Evidence JSON shape (what the UI consumes)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"commit": "abc123",
|
||||||
|
"entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
|
||||||
|
"sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
|
||||||
|
"confidence": "high",
|
||||||
|
"tainted": true,
|
||||||
|
"path": [
|
||||||
|
{"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
|
||||||
|
{"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
|
||||||
|
{"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9) Risks and mitigations (explicit)
|
||||||
|
|
||||||
|
1. **Dynamic behavior (reflection, DI, router magic)**
|
||||||
|
|
||||||
|
* Mitigation: conservative fallbacks + confidence labels + optional runtime traces later
|
||||||
|
|
||||||
|
2. **Noise from huge callgraphs**
|
||||||
|
|
||||||
|
* Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only
|
||||||
|
|
||||||
|
3. **Large repo build failures**
|
||||||
|
|
||||||
|
* Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag
|
||||||
|
|
||||||
|
4. **Teams rejecting gating**
|
||||||
|
|
||||||
|
* Mitigation: staged rollout:
|
||||||
|
|
||||||
|
* Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10) Definition of done (what “implemented” means)
|
||||||
|
|
||||||
|
You should consider Smart‑Diff “implemented” when:
|
||||||
|
|
||||||
|
* A repo can enable it with one config + one CI job.
|
||||||
|
* PRs get:
|
||||||
|
|
||||||
|
* a small number of **actionable** findings (not hundreds)
|
||||||
|
* each finding has a proof path with file/line hops
|
||||||
|
* It reliably detects at least:
|
||||||
|
|
||||||
|
* new command execution paths
|
||||||
|
* new unsafe deserialization paths
|
||||||
|
* new tainted filesystem write paths
|
||||||
|
* It can optionally emit VEX decisions backed by reachability evidence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
If you want the next step, I can also give you:
|
||||||
|
|
||||||
|
* a **concrete repo layout** with module boundaries,
|
||||||
|
* the **Postgres schema** (tables + indexes),
|
||||||
|
* and a **language adapter interface** (so adding Java/Python later is straightforward).
|
||||||
Reference in New Issue
Block a user