Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
This commit is contained in:
@@ -0,0 +1,750 @@
|
||||
Here’s a simple, practical way to score vulnerabilities that’s more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable.
|
||||
|
||||
---
|
||||
|
||||
### Why move beyond CVSS?
|
||||
|
||||
* **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime.
|
||||
* **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.”
|
||||
* **Teams need consistency**: the same image should get the same score across environments when inputs are identical.
|
||||
|
||||
---
|
||||
|
||||
### The scoring idea (plain English)
|
||||
|
||||
Score = a weighted function of:
|
||||
|
||||
1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call).
|
||||
2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime).
|
||||
3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match).
|
||||
|
||||
A compact, auditable formula you can start with:
|
||||
|
||||
```
|
||||
NormalizedScore = W_R * f(R) + W_E * g(E) + W_P * h(P)
|
||||
```
|
||||
|
||||
* Pick monotonic, bounded transforms (e.g., map to 0..1):
|
||||
|
||||
* f(R): inverse of hops (shorter path ⇒ higher value)
|
||||
* g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data)
|
||||
* h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible)
|
||||
|
||||
Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage).
|
||||
|
||||
---
|
||||
|
||||
### What makes this “deterministic”?
|
||||
|
||||
* Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations.
|
||||
* The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte.
|
||||
|
||||
---
|
||||
|
||||
### Minimal rubric (ready to implement)
|
||||
|
||||
* **Reachability (R, 0..1)**
|
||||
|
||||
* 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops)
|
||||
* 0.66 = reachable but behind uncommon feature flag or deep path (4–7 hops)
|
||||
* 0.33 = only theoretically reachable (code present, no discovered path)
|
||||
* 0.00 = dead/unreferenced code in this build
|
||||
* **Evidence (E, 0..1)** (sum, capped at 1.0)
|
||||
|
||||
* +0.6 runtime trace hitting the symbol
|
||||
* +0.3 DAST/integ test activating vulnerable behavior
|
||||
* +0.2 SAST precise sink match
|
||||
* +0.1 SCA presence only (no call evidence)
|
||||
* (Apply 10–30% decay if older than N days)
|
||||
* **Provenance (P, 0..1)**
|
||||
|
||||
* 0.0 unsigned/unknown origin
|
||||
* 0.3 signed image only
|
||||
* 0.6 signed + SBOM (hash-linked)
|
||||
* 1.0 signed + SBOM + DSSE attestations + reproducible build match
|
||||
|
||||
Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`.
|
||||
|
||||
---
|
||||
|
||||
### How this plugs into **Stella Ops**
|
||||
|
||||
* **Scanner** produces call-graphs & symbol maps (R).
|
||||
* **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E).
|
||||
* **Authority/Proof‑Graph** verifies signatures, SBOM↔image hash links, DSSE/Rekor (P).
|
||||
* **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes.
|
||||
* **Replay**: any audit can re-run the same policy with the same inputs and get the same score.
|
||||
|
||||
---
|
||||
|
||||
### Developer checklist (do this first)
|
||||
|
||||
* Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes.
|
||||
* Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes.
|
||||
* Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint).
|
||||
* Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs’ hashes in every VEX note.
|
||||
* Add a **30‑sec “Time‑to‑Evidence” UI**: click a score → see the exact call path, evidence list, and provenance checks.
|
||||
|
||||
---
|
||||
|
||||
### Why this helps compliance & sales
|
||||
|
||||
* Every number is **auditable** (inputs + function are transparent).
|
||||
* Scores remain **consistent across air‑gapped sites** (deterministic, no hidden heuristics).
|
||||
* You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves).
|
||||
|
||||
If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into Stella Ops today.
|
||||
Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **Stella Ops** (Scanner → Evidence/Vexer → Authority/Proof‑Graph → Policy Engine → UI/VEX output). I’m assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic.
|
||||
|
||||
---
|
||||
|
||||
## 1) Extend the scoring model into a stable, “auditable primitive”
|
||||
|
||||
### 1.1 Outputs you should standardize on
|
||||
|
||||
Produce **two** signed artifacts per finding (plus optional UI views):
|
||||
|
||||
1. **ScoreResult** (primary):
|
||||
|
||||
* `riskScore` (0–100 integer)
|
||||
* `subscores` (each 0–100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance`
|
||||
* `explain[]` (structured reasons, ordered deterministically)
|
||||
* `inputs` (digests of all upstream inputs)
|
||||
* `policy` (policy version + digest)
|
||||
* `engine` (engine version + digest)
|
||||
* `asOf` timestamp (the only “time” allowed to affect the result)
|
||||
|
||||
2. **VEX note** (OpenVEX/CSAF-compatible wrapper):
|
||||
|
||||
* references ScoreResult digest
|
||||
* embeds the score (optional) + the input digests
|
||||
* signed by Stella Ops Authority
|
||||
|
||||
> Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version.
|
||||
|
||||
---
|
||||
|
||||
## 2) Make determinism non-negotiable
|
||||
|
||||
### 2.1 Determinism rules (implement as “engineering constraints”)
|
||||
|
||||
These are the common ways deterministic systems become non-deterministic:
|
||||
|
||||
* **No floating point** in scoring math. Use integer “basis points” and integer bucket tables.
|
||||
* **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence “freshness” is computed as `asOf - evidence.timestamp`.
|
||||
* **Canonical serialization** for hashing:
|
||||
|
||||
* Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile.
|
||||
* Sort keys and arrays deterministically.
|
||||
* **Stable ordering** for explanation lists:
|
||||
|
||||
* Always sort factors by `(factorId, contributingObjectDigest)`.
|
||||
|
||||
### 2.2 Fixed-point scoring approach (recommended)
|
||||
|
||||
Represent weights and multipliers as **basis points** (bps):
|
||||
|
||||
* 100% = 10,000 bps
|
||||
* 1% = 100 bps
|
||||
|
||||
Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
|
||||
|
||||
---
|
||||
|
||||
## 3) Extended score definition (v1)
|
||||
|
||||
### 3.1 Subscores (0–100 integers)
|
||||
|
||||
#### BaseSeverity (B)
|
||||
|
||||
* Source: CVSS if present, else vendor severity, else default.
|
||||
* Normalize to 0–100:
|
||||
|
||||
* CVSS 0.0–10.0 → 0–100 by `B = round(CVSS * 10)`
|
||||
|
||||
Keep it small weight so you’re “beyond CVSS” but still anchored.
|
||||
|
||||
#### Reachability (R)
|
||||
|
||||
Computed from reachability report (call-path depth + gating conditions).
|
||||
|
||||
**Hop buckets** (example):
|
||||
|
||||
* 0–2 hops: 100
|
||||
* 3 hops: 85
|
||||
* 4 hops: 70
|
||||
* 5 hops: 55
|
||||
* 6 hops: 45
|
||||
* 7 hops: 35
|
||||
* 8+ hops: 20
|
||||
* unreachable: 0
|
||||
|
||||
**Gate multipliers** (apply multiplicatively in bps):
|
||||
|
||||
* behind feature flag: ×7000
|
||||
* auth required: ×8000
|
||||
* only admin role: ×8500
|
||||
* non-default config: ×7500
|
||||
|
||||
Final: `R = bucketScore * gateMultiplier / 10000`
|
||||
|
||||
#### Evidence (E)
|
||||
|
||||
Sum evidence “points” capped at 100, then apply freshness multiplier.
|
||||
|
||||
Evidence points (example):
|
||||
|
||||
* runtime trace hitting vulnerable symbol: +60
|
||||
* DAST / integration test triggers behavior: +30
|
||||
* SAST precise sink match: +20
|
||||
* SCA presence only: +10
|
||||
|
||||
Freshness bucket multiplier (example):
|
||||
|
||||
* age ≤ 7 days: ×10000
|
||||
* ≤ 30 days: ×9000
|
||||
* ≤ 90 days: ×7500
|
||||
* ≤ 180 days: ×6000
|
||||
* ≤ 365 days: ×4000
|
||||
* > 365: ×2000
|
||||
|
||||
Final: `E = min(100, sum(points)) * freshness / 10000`
|
||||
|
||||
#### Provenance (P)
|
||||
|
||||
Based on verified supply-chain checks.
|
||||
|
||||
Levels:
|
||||
|
||||
* unsigned/unknown: 0
|
||||
* signed image: 30
|
||||
* signed + SBOM hash-linked to image: 60
|
||||
* signed + SBOM + DSSE attestations verified: 80
|
||||
* above + reproducible build match: 100
|
||||
|
||||
### 3.2 Total score and overrides
|
||||
|
||||
Weights (example):
|
||||
|
||||
* `wB=1000` (10%)
|
||||
* `wR=4500` (45%)
|
||||
* `wE=3000` (30%)
|
||||
* `wP=1500` (15%)
|
||||
|
||||
Total:
|
||||
|
||||
* `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
|
||||
|
||||
Override examples (still deterministic, because they depend on evidence flags):
|
||||
|
||||
* If `knownExploited=true` AND `R >= 70` → force score to 95+
|
||||
* If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25
|
||||
|
||||
---
|
||||
|
||||
## 4) Canonical schemas (what to build first)
|
||||
|
||||
### 4.1 ReachabilityReport (per artifact + vuln)
|
||||
|
||||
Minimum fields:
|
||||
|
||||
* `artifactDigest` (sha256 of image or build artifact)
|
||||
* `graphDigest` (sha256 of canonical call-graph representation)
|
||||
* `vulnId` (CVE/OSV/etc)
|
||||
* `vulnerableSymbol` (fully-qualified)
|
||||
* `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers)
|
||||
* `shortestPath`:
|
||||
|
||||
* `hops` (int)
|
||||
* `nodes[]` (ordered list of symbols)
|
||||
* `edges[]` (optional)
|
||||
* `gates[]`:
|
||||
|
||||
* `type` (“featureFlag” | “authRequired” | “configNonDefault” | …)
|
||||
* `detail` (string)
|
||||
* `computedAt` (timestamp)
|
||||
* `toolVersion`
|
||||
|
||||
### 4.2 EvidenceBundle (per artifact + vuln)
|
||||
|
||||
Evidence items are immutable and deduped by content hash.
|
||||
|
||||
* `evidenceId` (content hash)
|
||||
* `artifactDigest`
|
||||
* `vulnId`
|
||||
* `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”)
|
||||
* `tool` (name/version)
|
||||
* `timestamp`
|
||||
* `confidence` (0–100)
|
||||
* `subject` (package, symbol, endpoint)
|
||||
* `payloadDigest` (hash of raw payload stored separately)
|
||||
|
||||
### 4.3 ProvenanceReport (per artifact)
|
||||
|
||||
* `artifactDigest`
|
||||
* `signatureChecks[]` (who signed, what key, result)
|
||||
* `sbomDigest` + `sbomType`
|
||||
* `attestations[]` (DSSE digests + verification result)
|
||||
* `transparencyLogRefs[]` (optional)
|
||||
* `reproducibleMatch` (bool)
|
||||
* `computedAt`
|
||||
* `toolVersion`
|
||||
* `verificationLogDigest`
|
||||
|
||||
### 4.4 ScoreInput + ScoreResult
|
||||
|
||||
**ScoreInput** should include:
|
||||
|
||||
* `asOf`
|
||||
* `policyVersion`
|
||||
* digests for reachability/evidence/provenance/base severity source
|
||||
|
||||
**ScoreResult** should include:
|
||||
|
||||
* `riskScore`, `subscores`
|
||||
* `explain[]` (deterministic)
|
||||
* `engineVersion`, `policyDigest`
|
||||
* `inputs[]` (digests)
|
||||
* `resultDigest` (hash of canonical ScoreResult)
|
||||
* `signature` (Authority signs the digest)
|
||||
|
||||
---
|
||||
|
||||
## 5) Development implementation plan (phased, with deliverables + acceptance criteria)
|
||||
|
||||
### Phase A — Foundations: schemas, hashing, policy format, test harness
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Canonical JSON format rules + hashing utilities (shared lib)
|
||||
* JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult
|
||||
* “Golden fixture” repo: a set of input bundles and expected ScoreResults
|
||||
* Policy format `score.v1` (YAML or JSON) using **integer bps**
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Same input bundle → identical `resultDigest` across:
|
||||
|
||||
* OS (Linux/Windows)
|
||||
* CPU (x64/ARM64)
|
||||
* runtime versions (supported .NET versions)
|
||||
* Fixtures run in CI and fail on any byte-level diff
|
||||
|
||||
---
|
||||
|
||||
### Phase B — Scoring engine (pure function library)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* `Stella.ScoreEngine` as a pure library:
|
||||
|
||||
* `ComputeScore(ScoreInputBundle) -> ScoreResult`
|
||||
* `Explain(ScoreResult) -> structured explanation` (already embedded)
|
||||
* Policy parser + validator:
|
||||
|
||||
* weights sum to 10,000
|
||||
* bucket tables monotonic
|
||||
* override rules deterministic and total order
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* 100% deterministic tests passing (golden fixtures)
|
||||
* “Explain” always includes:
|
||||
|
||||
* subscores
|
||||
* applied buckets
|
||||
* applied gate multipliers
|
||||
* freshness bucket selected
|
||||
* provenance level selected
|
||||
* No non-deterministic dependencies (time, random, locale, float)
|
||||
|
||||
---
|
||||
|
||||
### Phase C — Evidence pipeline (Vexer / Evidence Store)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Normalized evidence ingestion adapters:
|
||||
|
||||
* SCA ingest (from your existing scanner output)
|
||||
* SAST ingest
|
||||
* DAST ingest
|
||||
* runtime trace ingest (optional MVP → “symbol hit” events)
|
||||
* Evidence Store service:
|
||||
|
||||
* immutability (append-only)
|
||||
* dedupe by `evidenceId`
|
||||
* query by `(artifactDigest, vulnId)`
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Ingesting the same evidence twice yields identical state (idempotent)
|
||||
* Every evidence record can be exported as a bundle with content hashes
|
||||
* Evidence timestamps preserved; `asOf` drives freshness deterministically
|
||||
|
||||
---
|
||||
|
||||
### Phase D — Reachability analyzer (Scanner extension)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Call-graph builder and symbol resolver:
|
||||
|
||||
* for .NET: IL-level call graph + ASP.NET route discovery
|
||||
* Reachability computation:
|
||||
|
||||
* compute shortest path hops from entrypoints to vulnerable symbol
|
||||
* attach gating detections (config/feature/auth heuristics)
|
||||
* Reachability report emitter:
|
||||
|
||||
* emits ReachabilityReport with stable digests
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Given the same build artifact, reachability report digest is stable
|
||||
* Paths are replayable and visualizable (nodes are resolvable)
|
||||
* Unreachable findings are explicitly marked and explainable
|
||||
|
||||
---
|
||||
|
||||
### Phase E — Provenance verification (Authority / Proof‑Graph)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Verification pipeline:
|
||||
|
||||
* signature verification for artifact digest
|
||||
* SBOM hash linking
|
||||
* attestation verification (DSSE/in‑toto style)
|
||||
* optional transparency log reference capture
|
||||
* optional reproducible-build comparison input
|
||||
* ProvenanceReport emitter (signed verification log digest)
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Verification is offline-capable if given the necessary bundles
|
||||
* Any failed check is captured with a deterministic error code + message
|
||||
* ProvenanceReport digest is stable for same inputs
|
||||
|
||||
---
|
||||
|
||||
### Phase F — Orchestration: “score a finding” workflow + VEX output
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Orchestrator service (or existing pipeline step) that:
|
||||
|
||||
1. receives a vulnerability finding
|
||||
2. fetches reachability/evidence/provenance bundles
|
||||
3. builds ScoreInput with `asOf`
|
||||
4. computes ScoreResult
|
||||
5. signs ScoreResult digest
|
||||
6. emits VEX note referencing ScoreResult digest
|
||||
* Storage for ScoreResult + VEX note (immutable, versioned)
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* “Recompute” produces same ScoreResult digest if inputs unchanged
|
||||
* VEX note includes:
|
||||
|
||||
* policy version + digest
|
||||
* engine version
|
||||
* input digests
|
||||
* score + subscores
|
||||
* End-to-end API returns “why” data in <1 round trip (cached)
|
||||
|
||||
---
|
||||
|
||||
### Phase G — UI: “Why this score?” and replay/export
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Findings view enhancements:
|
||||
|
||||
* score badge + risk bucket (Low/Med/High/Critical)
|
||||
* click-through “Why this score”
|
||||
* “Why this score” panel:
|
||||
|
||||
* call path visualization (at least as an ordered list for MVP)
|
||||
* evidence list with freshness + confidence
|
||||
* provenance checks list (pass/fail)
|
||||
* export bundle (inputs + policy + engine version) for audit replay
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Any score is explainable in <30 seconds by a human reviewer
|
||||
* Exported bundle can reproduce score offline
|
||||
|
||||
---
|
||||
|
||||
### Phase H — Governance: policy-as-code, versioning, calibration, rollout
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Policy registry:
|
||||
|
||||
* store `score.v1` policies by org/project/environment
|
||||
* approvals + change log
|
||||
* Versioning strategy:
|
||||
|
||||
* engine semantic versioning
|
||||
* policy digest pinned in ScoreResult
|
||||
* migration tooling (e.g., score.v1 → score.v2)
|
||||
* Rollout mechanics:
|
||||
|
||||
* shadow mode: compute score but don’t enforce
|
||||
* enforcement gates: block deploy if score ≥ threshold
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Policy changes never rewrite past scores
|
||||
* You can backfill new scores with a new policy version without ambiguity
|
||||
* Audit log shows: who changed policy, when, why (optional but recommended)
|
||||
|
||||
---
|
||||
|
||||
## 6) Engineering backlog (epics → stories → DoD)
|
||||
|
||||
### Epic 1: Deterministic core
|
||||
|
||||
* Story: implement canonical JSON + hashing
|
||||
* Story: implement fixed-point math helpers (bps)
|
||||
* Story: implement score.v1 buckets + overrides
|
||||
* DoD:
|
||||
|
||||
* no floats
|
||||
* golden test suite
|
||||
* deterministic explain ordering
|
||||
|
||||
### Epic 2: Evidence normalization
|
||||
|
||||
* Story: evidence schema + dedupe
|
||||
* Story: adapters (SCA/SAST/DAST/runtime)
|
||||
* Story: evidence query API
|
||||
* DoD:
|
||||
|
||||
* idempotent ingest
|
||||
* bundle export with digests
|
||||
|
||||
### Epic 3: Reachability
|
||||
|
||||
* Story: entrypoint discovery for target frameworks
|
||||
* Story: call graph extraction
|
||||
* Story: shortest-path computation
|
||||
* Story: gating heuristics
|
||||
* DoD:
|
||||
|
||||
* stable digests
|
||||
* replayable paths
|
||||
|
||||
### Epic 4: Provenance
|
||||
|
||||
* Story: verify signatures
|
||||
* Story: verify SBOM link
|
||||
* Story: verify attestations
|
||||
* Story: reproducible match input support
|
||||
* DoD:
|
||||
|
||||
* deterministic error codes
|
||||
* stable provenance scoring
|
||||
|
||||
### Epic 5: End-to-end score + VEX
|
||||
|
||||
* Story: orchestration
|
||||
* Story: ScoreResult signing
|
||||
* Story: VEX generation and storage
|
||||
* DoD:
|
||||
|
||||
* recompute parity
|
||||
* verifiable signatures
|
||||
|
||||
### Epic 6: UI
|
||||
|
||||
* Story: score badge + buckets
|
||||
* Story: why panel
|
||||
* Story: export bundle + recompute button
|
||||
* DoD:
|
||||
|
||||
* human explainability
|
||||
* offline replay works
|
||||
|
||||
---
|
||||
|
||||
## 7) APIs to implement (minimal but complete)
|
||||
|
||||
### 7.1 Compute score (internal)
|
||||
|
||||
* `POST /api/score/compute`
|
||||
|
||||
* input: `ScoreInput` + references or inline bundles
|
||||
* output: `ScoreResult`
|
||||
|
||||
### 7.2 Get score (product)
|
||||
|
||||
* `GET /api/findings/{findingId}/score`
|
||||
|
||||
* returns latest ScoreResult + VEX reference
|
||||
|
||||
### 7.3 Explain score
|
||||
|
||||
* `GET /api/findings/{findingId}/score/explain`
|
||||
|
||||
* returns `explain[]` + call path + evidence list + provenance checks
|
||||
|
||||
### 7.4 Export replay bundle
|
||||
|
||||
* `GET /api/findings/{findingId}/score/bundle`
|
||||
|
||||
* returns a tar/zip containing:
|
||||
|
||||
* ScoreInput
|
||||
* policy file
|
||||
* reachability/evidence/provenance reports
|
||||
* engine version manifest
|
||||
|
||||
---
|
||||
|
||||
## 8) Testing strategy (what to automate early)
|
||||
|
||||
### Unit tests
|
||||
|
||||
* bucket selection correctness
|
||||
* gate multiplier composition
|
||||
* evidence freshness bucketing
|
||||
* provenance level mapping
|
||||
* override rule ordering
|
||||
|
||||
### Golden fixtures
|
||||
|
||||
* fixed input bundles → fixed ScoreResult digest
|
||||
* run on every supported platform/runtime
|
||||
|
||||
### Property-based tests
|
||||
|
||||
* monotonicity:
|
||||
|
||||
* fewer hops should not reduce R
|
||||
* more evidence points should not reduce E
|
||||
* stronger provenance should not reduce P
|
||||
|
||||
### Integration tests
|
||||
|
||||
* full pipeline: finding → bundles → score → VEX
|
||||
* “recompute” parity tests
|
||||
|
||||
---
|
||||
|
||||
## 9) Operational concerns and hardening
|
||||
|
||||
### Performance
|
||||
|
||||
* Cache reachability per `(artifactDigest, vulnId, symbol)`
|
||||
* Cache provenance per `artifactDigest`
|
||||
* Evidence queries should be indexed by `(artifactDigest, vulnId, type)`
|
||||
|
||||
### Security
|
||||
|
||||
* Treat evidence ingestion as untrusted input:
|
||||
|
||||
* strict schema validation
|
||||
* content-hash dedupe prevents tampering via overwrite
|
||||
* Sign ScoreResults and VEX notes
|
||||
* RBAC:
|
||||
|
||||
* who can change policy
|
||||
* who can override scores (if allowed at all)
|
||||
|
||||
### Data retention
|
||||
|
||||
* Evidence payloads can be large; keep digests + store raw payloads in object storage
|
||||
* Keep a “minimal replay bundle” always (schemas + digests + policy + engine)
|
||||
|
||||
---
|
||||
|
||||
## 10) Concrete “MVP first” slice (smallest valuable product)
|
||||
|
||||
If you want a crisp MVP that still satisfies “auditable determinism”:
|
||||
|
||||
1. Scoring engine (`B + R + E + P`), fixed-point, golden tests
|
||||
2. Evidence store (SCA + runtime optional)
|
||||
3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates)
|
||||
4. Provenance: signed image + SBOM link only
|
||||
5. UI: score + “why” panel showing:
|
||||
|
||||
* hops/path list
|
||||
* evidence list
|
||||
* provenance checklist
|
||||
6. Emit a signed VEX note containing the score + input digests
|
||||
|
||||
That MVP already proves the core differentiator: **deterministic, replayable risk scoring**.
|
||||
|
||||
---
|
||||
|
||||
## 11) Starter policy file (score.v1) using basis points
|
||||
|
||||
Here’s a good “real implementation” starting point (int-only):
|
||||
|
||||
```yaml
|
||||
policyVersion: score.v1
|
||||
weightsBps:
|
||||
baseSeverity: 1000
|
||||
reachability: 4500
|
||||
evidence: 3000
|
||||
provenance: 1500
|
||||
|
||||
reachability:
|
||||
hopBuckets:
|
||||
- { maxHops: 2, score: 100 }
|
||||
- { maxHops: 3, score: 85 }
|
||||
- { maxHops: 4, score: 70 }
|
||||
- { maxHops: 5, score: 55 }
|
||||
- { maxHops: 6, score: 45 }
|
||||
- { maxHops: 7, score: 35 }
|
||||
- { maxHops: 9999, score: 20 }
|
||||
unreachableScore: 0
|
||||
gateMultipliersBps:
|
||||
featureFlag: 7000
|
||||
authRequired: 8000
|
||||
adminOnly: 8500
|
||||
nonDefaultConfig: 7500
|
||||
|
||||
evidence:
|
||||
points:
|
||||
runtime: 60
|
||||
dast: 30
|
||||
sast: 20
|
||||
sca: 10
|
||||
freshnessBuckets:
|
||||
- { maxAgeDays: 7, multiplierBps: 10000 }
|
||||
- { maxAgeDays: 30, multiplierBps: 9000 }
|
||||
- { maxAgeDays: 90, multiplierBps: 7500 }
|
||||
- { maxAgeDays: 180, multiplierBps: 6000 }
|
||||
- { maxAgeDays: 365, multiplierBps: 4000 }
|
||||
- { maxAgeDays: 99999, multiplierBps: 2000 }
|
||||
|
||||
provenance:
|
||||
levels:
|
||||
unsigned: 0
|
||||
signed: 30
|
||||
signedWithSbom: 60
|
||||
signedWithSbomAndAttestations: 80
|
||||
reproducible: 100
|
||||
|
||||
overrides:
|
||||
- name: knownExploitedAndReachable
|
||||
when:
|
||||
flags:
|
||||
knownExploited: true
|
||||
minReachability: 70
|
||||
setScore: 95
|
||||
|
||||
- name: unreachableAndOnlySca
|
||||
when:
|
||||
maxReachability: 0
|
||||
maxEvidence: 10
|
||||
clampMaxScore: 25
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering.
|
||||
@@ -0,0 +1,744 @@
|
||||
Here’s a simple, high‑leverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer:
|
||||
|
||||
# Time‑to‑First‑Signal (TTFS)
|
||||
|
||||
**What it is:** the time from opening a run’s details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed – `dotnet restore` 401 – token expired”).
|
||||
**Why it matters:** engineers don’t need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR.
|
||||
|
||||
---
|
||||
|
||||
## What counts as a “first signal”
|
||||
|
||||
* Failed stage + reason (exit code, key log line, failing test name)
|
||||
* Degraded but actionable status (e.g., flaky test signature)
|
||||
* Policy gate block with the specific rule that failed
|
||||
* Reachability‑aware security finding that blocks deploy (one concrete example, not the whole list)
|
||||
|
||||
> Not a signal: spinners, generic “loading…”, or unactionable counts.
|
||||
|
||||
---
|
||||
|
||||
## How to optimize TTFS (practical steps)
|
||||
|
||||
1. **Deferred loading (prioritize critical panes):**
|
||||
|
||||
* Render header + failing stage card first; lazy‑load artifacts, full logs, and graphs after.
|
||||
* Pre‑expand the *first failing node* in the stage graph.
|
||||
|
||||
2. **Log pre‑indexing at ingest:**
|
||||
|
||||
* During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`.
|
||||
* Extract the **first error tuple** (timestamp, step, message) and store it next to the job record.
|
||||
* On UI open, fetch only that tuple (sub‑100 ms) before fetching the rest.
|
||||
|
||||
3. **Cached summaries:**
|
||||
|
||||
* Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres.
|
||||
* Invalidate on new job events; always serve this summary first.
|
||||
|
||||
4. **Edge prefetch:**
|
||||
|
||||
* When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”.
|
||||
|
||||
5. **Compress + cap first log burst:**
|
||||
|
||||
* Send the first **5–10 error lines** (already extracted) immediately; stream the rest.
|
||||
|
||||
---
|
||||
|
||||
## Instrumentation (so you can prove it)
|
||||
|
||||
Emit these points as telemetry:
|
||||
|
||||
* `ttfs_start`: when the run details route is entered (or when tab becomes visible)
|
||||
* `ttfs_signal_rendered`: when the first actionable card is in the DOM
|
||||
* `ttfs_ms = signal_rendered - start`
|
||||
* Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state`
|
||||
|
||||
**SLO:** *P50 ≤ 700 ms, P95 ≤ 2.5 s* (adjust to your infra).
|
||||
|
||||
**Dashboards to track:**
|
||||
|
||||
* TTFS distribution (P50/P90/P95) by release
|
||||
* Correlate TTFS with bounce rate and “open → rerun” delay
|
||||
* Error budget: % of views with TTFS > 3 s
|
||||
|
||||
---
|
||||
|
||||
## Minimal backend contract (example)
|
||||
|
||||
```json
|
||||
GET /api/runs/{runId}/first-signal
|
||||
{
|
||||
"runId": "123",
|
||||
"firstSignal": {
|
||||
"type": "stage_failed",
|
||||
"stage": "build",
|
||||
"step": "dotnet restore",
|
||||
"message": "401 Unauthorized: token expired",
|
||||
"at": "2025-12-11T09:22:31Z",
|
||||
"artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} }
|
||||
},
|
||||
"summaryEtag": "W/\"a1b2c3\""
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Frontend pattern (Angular 17, signal‑first)
|
||||
|
||||
* Fire `first-signal` request in route resolver.
|
||||
* Render `FirstSignalCard` immediately.
|
||||
* Lazy‑load stage graph, full logs, security panes.
|
||||
* Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport.
|
||||
|
||||
---
|
||||
|
||||
## CI adapter hints (GitLab/GitHub/Azure)
|
||||
|
||||
* Hook on job status webhooks to compute & store the first error tuple.
|
||||
* For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`.
|
||||
|
||||
---
|
||||
|
||||
## “Good TTFS” acceptance tests
|
||||
|
||||
* Run with early fail → first signal < 1 s, shows exact command + exit code.
|
||||
* Run with policy gate fail → rule name + fix hint visible first.
|
||||
* Offline/slow network → cached summary still renders an actionable hint.
|
||||
|
||||
---
|
||||
|
||||
## Copy to put in your UX guidelines
|
||||
|
||||
> “Optimize **Time‑to‑First‑Signal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.”
|
||||
|
||||
If you want, I can sketch the exact DB schema for the pre‑indexed log tuples and the Angular resolver + telemetry hooks next.
|
||||
Below is an extended, end‑to‑end implementation plan for **Time‑to‑First‑Signal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rollout—structured as epics/phases with “definition of done” and acceptance criteria.
|
||||
|
||||
---
|
||||
|
||||
# Scope extension
|
||||
|
||||
## What we’re building
|
||||
|
||||
A run details experience that renders **one actionable clue** fast—before loading heavy UI like full logs, graphs, artifacts.
|
||||
|
||||
**“First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification).
|
||||
|
||||
## What we’re extending beyond the initial idea
|
||||
|
||||
1. **First‑Signal Quality** (not just speed)
|
||||
|
||||
* Classify error type (auth, dependency, compilation, test, infra, policy, timeout).
|
||||
* Identify “culprit step” and a stable “signature” for dedupe and search.
|
||||
2. **Progressive disclosure UX**
|
||||
|
||||
* Summary → First signal card → expanded context (stage graph, logs, artifacts).
|
||||
3. **Provider‑agnostic ingestion**
|
||||
|
||||
* Adapters for GitLab/GitHub/Azure (or your CI provider).
|
||||
4. **Caching + prefetch**
|
||||
|
||||
* Warm open from list/table, with ETags and stale‑while‑revalidate.
|
||||
5. **Observability & SLOs**
|
||||
|
||||
* TTFS metrics, dashboards, alerting, and quality metrics (false signals).
|
||||
6. **Rollout safety**
|
||||
|
||||
* Feature flags, canary, A/B gating, and a guaranteed fallback path.
|
||||
|
||||
---
|
||||
|
||||
# Success criteria
|
||||
|
||||
## Primary metric
|
||||
|
||||
* **TTFS (ms)**: time from details page route enter → first actionable signal rendered.
|
||||
|
||||
## Targets (example SLOs)
|
||||
|
||||
* **P50 ≤ 700 ms**, **P95 ≤ 2500 ms** on warm path.
|
||||
* **Cold path**: P95 ≤ 4000 ms (depends on infra).
|
||||
|
||||
## Secondary outcome metrics
|
||||
|
||||
* **Open→Action time**: time from opening run to first user action (rerun, cancel, assign, open failing log line).
|
||||
* **Bounce rate**: close page within 10 seconds without interaction.
|
||||
* **MTTR proxy**: time from failure to first rerun or fix commit.
|
||||
|
||||
## Quality metrics
|
||||
|
||||
* **Signal availability rate**: % of run views that show a first signal card within 3s.
|
||||
* **Signal accuracy score** (sampled): engineer confirms “helpful vs not”.
|
||||
* **Extractor failure rate**: parsing errors / missing mappings / timeouts.
|
||||
|
||||
---
|
||||
|
||||
# Architecture overview
|
||||
|
||||
## Data flow
|
||||
|
||||
1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend.
|
||||
2. **Run summarizer** maintains:
|
||||
|
||||
* `run_summary` (small JSON)
|
||||
* `first_signal` (small, actionable payload)
|
||||
3. **UI opens run details**
|
||||
|
||||
* Immediately calls `GET /runs/{id}/first-signal` (or `/summary`).
|
||||
* Renders FirstSignalCard as soon as payload arrives.
|
||||
4. Background fetches:
|
||||
|
||||
* Stage graph, full logs, artifacts, security scans, trends.
|
||||
|
||||
## Key decision: where to compute first signal
|
||||
|
||||
* **Option A: at ingest time (recommended)**
|
||||
Compute first signal when logs/events arrive, store it, serve it instantly.
|
||||
* **Option B: on demand**
|
||||
Compute when user opens run details (simpler initially, worse TTFS and load).
|
||||
|
||||
---
|
||||
|
||||
# Data model
|
||||
|
||||
## Tables (relational example)
|
||||
|
||||
### `ci_run`
|
||||
|
||||
* `run_id (pk)`
|
||||
* `provider`
|
||||
* `repo_id`
|
||||
* `branch`
|
||||
* `status`
|
||||
* `created_at`, `updated_at`
|
||||
|
||||
### `ci_job`
|
||||
|
||||
* `job_id (pk)`
|
||||
* `run_id (fk)`
|
||||
* `stage_name`
|
||||
* `job_name`
|
||||
* `status`
|
||||
* `started_at`, `finished_at`
|
||||
|
||||
### `ci_log_chunk`
|
||||
|
||||
* `chunk_id (pk)`
|
||||
* `job_id (fk)`
|
||||
* `seq` (monotonic)
|
||||
* `byte_start`, `byte_end` (range into blob)
|
||||
* `first_error_line_no` (nullable)
|
||||
* `first_error_excerpt` (nullable, short)
|
||||
* `severity_max` (info/warn/error)
|
||||
|
||||
### `ci_run_summary`
|
||||
|
||||
* `run_id (pk)`
|
||||
* `version` (e.g., `1`)
|
||||
* `etag` (hash)
|
||||
* `summary_json` (small, 1–5 KB)
|
||||
* `updated_at`
|
||||
|
||||
### `ci_first_signal`
|
||||
|
||||
* `run_id (pk)`
|
||||
* `etag`
|
||||
* `signal_json` (small, 0.5–2 KB)
|
||||
* `quality_flags` (bitmask or json)
|
||||
* `updated_at`
|
||||
|
||||
## Cache layer
|
||||
|
||||
* Redis keys:
|
||||
|
||||
* `run:{runId}:summary:v1`
|
||||
* `run:{runId}:first-signal:v1`
|
||||
* TTL: generous but safe (e.g., 24h) with “write‑through” on event updates.
|
||||
|
||||
---
|
||||
|
||||
# First signal definition
|
||||
|
||||
## `FirstSignal` object (recommended shape)
|
||||
|
||||
```json
|
||||
{
|
||||
"runId": "123",
|
||||
"computedAt": "2025-12-12T09:22:31Z",
|
||||
"status": "failed",
|
||||
"firstSignal": {
|
||||
"type": "stage_failed",
|
||||
"classification": "dependency_auth",
|
||||
"stage": "build",
|
||||
"job": "build-linux-x64",
|
||||
"step": "dotnet restore",
|
||||
"message": "401 Unauthorized: token expired",
|
||||
"signature": "dotnet-restore-401-unauthorized",
|
||||
"log": {
|
||||
"jobId": "job-789",
|
||||
"lines": [
|
||||
"error : Response status code does not indicate success: 401 (Unauthorized).",
|
||||
"error : The token is expired."
|
||||
],
|
||||
"range": { "start": 1880, "end": 1896 }
|
||||
},
|
||||
"suggestedActions": [
|
||||
{ "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" },
|
||||
{ "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" }
|
||||
]
|
||||
},
|
||||
"etag": "W/\"a1b2c3\""
|
||||
}
|
||||
```
|
||||
|
||||
### Notes
|
||||
|
||||
* `signature` should be stable for grouping.
|
||||
* `suggestedActions` is optional but hugely valuable (even 1–2 actions).
|
||||
|
||||
---
|
||||
|
||||
# APIs
|
||||
|
||||
## 1) First signal endpoint
|
||||
|
||||
**GET** `/api/runs/{runId}/first-signal`
|
||||
|
||||
Headers:
|
||||
|
||||
* `If-None-Match: W/"..."` supported
|
||||
* Response includes `ETag` and `Cache-Control`
|
||||
|
||||
Responses:
|
||||
|
||||
* `200`: full first signal object
|
||||
* `304`: not modified
|
||||
* `404`: run not found
|
||||
* `204`: run exists but signal not available yet (rare; should degrade gracefully)
|
||||
|
||||
## 2) Summary endpoint (optional but useful)
|
||||
|
||||
**GET** `/api/runs/{runId}/summary`
|
||||
|
||||
* Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts.
|
||||
|
||||
## 3) SSE / WebSocket updates (nice-to-have)
|
||||
|
||||
**GET** `/api/runs/{runId}/events` (SSE)
|
||||
|
||||
* Push new signal or summary updates in near real-time while user is on the page.
|
||||
|
||||
---
|
||||
|
||||
# Frontend implementation plan (Angular 17)
|
||||
|
||||
## UX behavior
|
||||
|
||||
1. **Route enter**
|
||||
|
||||
* Start TTFS timer.
|
||||
2. Render instantly:
|
||||
|
||||
* Title, status badge, pipeline metadata (run id, commit, branch).
|
||||
* Skeleton for details area.
|
||||
3. Fetch first signal:
|
||||
|
||||
* Render `FirstSignalCard` immediately when available.
|
||||
* Fire telemetry event when card is **in DOM and visible**.
|
||||
4. Lazy-load:
|
||||
|
||||
* Stage graph
|
||||
* Full logs viewer
|
||||
* Artifacts list
|
||||
* Security findings
|
||||
* Trends, flaky tests, etc.
|
||||
|
||||
## Angular structure
|
||||
|
||||
* `RunDetailsResolver` (or `resolveFn`) requests first signal.
|
||||
* `RunDetailsComponent` uses signals to render quickly.
|
||||
* `FirstSignalCardComponent` is standalone + minimal deps.
|
||||
|
||||
## Prefetch strategy from runs list view
|
||||
|
||||
* When the runs table is visible, prefetch summaries/first signals for items in viewport:
|
||||
|
||||
* Use `IntersectionObserver` to prefetch only visible rows.
|
||||
* Store results in an in-memory cache (e.g., `Map<runId, FirstSignal>`).
|
||||
* Respect ETag to avoid redundant payloads.
|
||||
|
||||
## Telemetry hooks
|
||||
|
||||
* `ttfs_start`: route activation + tab visible
|
||||
* `ttfs_signal_rendered`: FirstSignalCard attached and visible
|
||||
* Dimensions: provider, repo, branch, run_type, release_version, network_state
|
||||
|
||||
---
|
||||
|
||||
# Backend implementation plan
|
||||
|
||||
## Summarizer / First-signal service
|
||||
|
||||
A service or module that:
|
||||
|
||||
* subscribes to run/job events
|
||||
* receives log chunks (or pointers)
|
||||
* computes and stores:
|
||||
|
||||
* `run_summary`
|
||||
* `first_signal`
|
||||
* publishes updates (optional) to an event stream for SSE
|
||||
|
||||
### Concurrency rule
|
||||
|
||||
First signal should be set once per run unless a “better” signal appears:
|
||||
|
||||
* if current signal is missing → set
|
||||
* if current signal is “generic” and new one is “specific” → replace
|
||||
* otherwise keep (avoid churn)
|
||||
|
||||
---
|
||||
|
||||
# Extraction & classification logic
|
||||
|
||||
## Minimum viable extractor (Phase 1)
|
||||
|
||||
* Heuristics:
|
||||
|
||||
* first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc.
|
||||
* plus provider-specific fail markers
|
||||
* Pull:
|
||||
|
||||
* stage/job/step context (from job metadata or step boundaries)
|
||||
* 5–10 log lines around first error line
|
||||
|
||||
## Improved extractor (Phase 2+)
|
||||
|
||||
* Language/tool specific rules:
|
||||
|
||||
* dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm
|
||||
* Add `classification` and `signature`:
|
||||
|
||||
* normalize common errors:
|
||||
|
||||
* auth expired/forbidden
|
||||
* missing dependency / DNS / TLS
|
||||
* compilation error
|
||||
* test failure (include test name)
|
||||
* infra capacity / agent lost
|
||||
* policy gate failure
|
||||
|
||||
## Guardrails
|
||||
|
||||
* **Secret redaction**: before storing excerpts, run your existing redaction pipeline.
|
||||
* **Payload cap**: cap message length and excerpt lines.
|
||||
* **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines.
|
||||
|
||||
---
|
||||
|
||||
# Development plan by phases (epics)
|
||||
|
||||
Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 — Baseline and alignment
|
||||
|
||||
### Deliverables
|
||||
|
||||
* Baseline TTFS measurement (current behavior)
|
||||
* Definition of “actionable signal” and priority rules
|
||||
* Performance budget for run details view
|
||||
|
||||
### Tasks
|
||||
|
||||
* Add client-side telemetry for current page load steps:
|
||||
|
||||
* route enter, summary loaded, logs loaded, graph loaded
|
||||
* Measure TTFS proxy today (likely “time to status shown”)
|
||||
* Identify top 20 failure modes in your CI (from historical logs)
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* Dashboard shows baseline P50/P95 for current experience.
|
||||
* “First signal” contract signed off with UI + backend teams.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Data model and storage
|
||||
|
||||
### Deliverables
|
||||
|
||||
* DB migrations for `ci_run_summary` and `ci_first_signal`
|
||||
* Redis cache keys and invalidation strategy
|
||||
* ADR: where summaries live and how they update
|
||||
|
||||
### Tasks
|
||||
|
||||
* Create tables and indices:
|
||||
|
||||
* index on `run_id`, `updated_at`, `provider`
|
||||
* Add serializer/deserializer for `summary_json` and `signal_json`
|
||||
* Implement ETag generation (hash of JSON payload)
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache).
|
||||
* ETag works end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Ingestion and first signal computation
|
||||
|
||||
### Deliverables
|
||||
|
||||
* First-signal computation module
|
||||
* Provider adapter integration points (webhook consumers)
|
||||
* “first error tuple” extraction from logs
|
||||
|
||||
### Tasks
|
||||
|
||||
* On job log append:
|
||||
|
||||
* scan incrementally for first error markers
|
||||
* store excerpt + line range + job/stage/step mapping
|
||||
* On job finish/fail:
|
||||
|
||||
* finalize first signal with best known context
|
||||
* Implement the “better signal replaces generic” rule
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* For a known failing run, API returns first signal without reading full log blob.
|
||||
* Computation does not exceed a small CPU budget per log chunk (guard with limits).
|
||||
* Extraction failure rate < 1% for sampled runs (initial).
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — API endpoints and caching
|
||||
|
||||
### Deliverables
|
||||
|
||||
* `/runs/{id}/first-signal` endpoint
|
||||
* Optional `/runs/{id}/summary`
|
||||
* Cache-control + ETag support
|
||||
* Access control checks consistent with existing run authorization
|
||||
|
||||
### Tasks
|
||||
|
||||
* Serve cached first signal first; fallback to DB
|
||||
* If missing:
|
||||
|
||||
* return `204` (or a “pending” object) and allow UI fallback
|
||||
* Add server-side metrics:
|
||||
|
||||
* endpoint latency, cache hit rate, payload size
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* Endpoint P95 latency meets target (e.g., < 200ms internal).
|
||||
* Cache hit rate is high for active runs (after prefetch).
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Frontend progressive rendering
|
||||
|
||||
### Deliverables
|
||||
|
||||
* FirstSignalCard component
|
||||
* Route resolver + local cache
|
||||
* Prefetch on runs list view
|
||||
* Telemetry for TTFS
|
||||
|
||||
### Tasks
|
||||
|
||||
* Render shell immediately
|
||||
* Fetch and render first signal
|
||||
* Lazy-load heavy panels using `@defer` / dynamic imports
|
||||
* Implement “open failing stage” default behavior
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* In throttled network test, first signal card appears significantly earlier than logs and graphs.
|
||||
* `ttfs_signal_rendered` fires exactly once per view, with correct dimensions.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Observability, dashboards, and alerting
|
||||
|
||||
### Deliverables
|
||||
|
||||
* TTFS dashboards by:
|
||||
|
||||
* provider, repo, run type, release version
|
||||
* Alerts:
|
||||
|
||||
* P95 regression threshold
|
||||
* Quality dashboard:
|
||||
|
||||
* availability rate, extraction failures, “generic signal rate”
|
||||
|
||||
### Tasks
|
||||
|
||||
* Create event pipeline for telemetry into your analytics system
|
||||
* Define SLO/error budget alerts
|
||||
* Add tracing (OpenTelemetry) for endpoint and summarizer
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* You can correlate TTFS with:
|
||||
|
||||
* bounce rate
|
||||
* open→action time
|
||||
* You can pinpoint whether regressions are backend, frontend, or provider‑specific.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — QA, performance testing, rollout
|
||||
|
||||
### Deliverables
|
||||
|
||||
* Automated tests
|
||||
* Feature flag + gradual rollout
|
||||
* A/B experiment (optional)
|
||||
|
||||
### Tasks
|
||||
|
||||
**Testing**
|
||||
|
||||
* Unit tests:
|
||||
|
||||
* extractor patterns
|
||||
* classification rules
|
||||
* Integration tests:
|
||||
|
||||
* simulated job logs with known outcomes
|
||||
* E2E (Playwright/Cypress):
|
||||
|
||||
* verify first signal appears before logs
|
||||
* verify fallback path works if endpoint fails
|
||||
* Performance tests:
|
||||
|
||||
* cold cache vs warm cache
|
||||
* throttled CPU/network profiles
|
||||
|
||||
**Rollout**
|
||||
|
||||
* Feature flag:
|
||||
|
||||
* enabled for internal users first
|
||||
* ramp by repo or percentage
|
||||
* Monitor key metrics during ramp:
|
||||
|
||||
* TTFS P95
|
||||
* API error rate
|
||||
* UI error rate
|
||||
* cache miss spikes
|
||||
|
||||
### Acceptance criteria
|
||||
|
||||
* No increase in overall error rates.
|
||||
* TTFS improves at least X% for a meaningful slice of users (define X from baseline).
|
||||
* Fallback UX remains usable when signals are unavailable.
|
||||
|
||||
---
|
||||
|
||||
# Backlog examples (ready-to-create Jira tickets)
|
||||
|
||||
## Epic: Run summary and first signal storage
|
||||
|
||||
* Create `ci_first_signal` table
|
||||
* Create `ci_run_summary` table
|
||||
* Implement ETag hashing
|
||||
* Implement Redis caching layer
|
||||
* Add admin/debug endpoint (internal only) to inspect computed signals
|
||||
|
||||
## Epic: Log chunk extraction
|
||||
|
||||
* Implement incremental log scanning
|
||||
* Store first error excerpt + range
|
||||
* Map excerpt to job + step
|
||||
* Add redaction pass to excerpts
|
||||
|
||||
## Epic: Run details progressive UI
|
||||
|
||||
* FirstSignalCard UI component
|
||||
* Lazy-load logs viewer
|
||||
* Default to opening failing stage
|
||||
* Prefetch signals in runs list
|
||||
|
||||
## Epic: Telemetry and dashboards
|
||||
|
||||
* Add `ttfs_start` and `ttfs_signal_rendered`
|
||||
* Add endpoint latency metrics
|
||||
* Build dashboards + alerts
|
||||
* Add sampling for “signal helpfulness” feedback
|
||||
|
||||
---
|
||||
|
||||
# Risk register and mitigations
|
||||
|
||||
## Risk: First signal is wrong/misleading
|
||||
|
||||
* Mitigation:
|
||||
|
||||
* track “generic signal rate” and “corrected by user” feedback
|
||||
* classification confidence scoring
|
||||
* always provide quick access to full logs as fallback
|
||||
|
||||
## Risk: Logs contain secrets
|
||||
|
||||
* Mitigation:
|
||||
|
||||
* redact excerpts before storing/serving
|
||||
* cap excerpt lines and length
|
||||
* keep raw logs behind existing permissions
|
||||
|
||||
## Risk: Increased ingest CPU cost
|
||||
|
||||
* Mitigation:
|
||||
|
||||
* incremental scanning with early stop after first error captured
|
||||
* limit scanning per chunk
|
||||
* sample/skip overly large logs until job completion
|
||||
|
||||
## Risk: Cache invalidation bugs
|
||||
|
||||
* Mitigation:
|
||||
|
||||
* ETag-based correctness
|
||||
* versioned keys: `:v1`
|
||||
* “write-through” cache updates from summarizer
|
||||
|
||||
---
|
||||
|
||||
# Definition of Done checklist
|
||||
|
||||
A phase is “done” when:
|
||||
|
||||
* ✅ TTFS measured with reliable client events
|
||||
* ✅ FirstSignalCard renders from `/first-signal` endpoint
|
||||
* ✅ ETag caching works
|
||||
* ✅ Fallback path is solid (no blank screens)
|
||||
* ✅ Dashboards exist and are actively watched during rollout
|
||||
* ✅ Security review completed for log excerpts/redaction
|
||||
* ✅ Load tests show no unacceptable backend regressions
|
||||
|
||||
---
|
||||
|
||||
# Optional enhancements after initial launch
|
||||
|
||||
1. **Next-step recommendations**
|
||||
Add action suggestions and deep links (rotate token, open failing test, open doc).
|
||||
2. **Flaky test / known issue detection**
|
||||
Show “this matches known flaky signature” with last-seen frequency.
|
||||
3. **“Compare to last green”**
|
||||
Summarize what changed since last successful run (commit diff, dependency bump).
|
||||
4. **SSE live updates**
|
||||
Update first signal as soon as failure occurs while user watches.
|
||||
|
||||
---
|
||||
|
||||
If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure you’d implement.
|
||||
@@ -0,0 +1,643 @@
|
||||
Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
|
||||
|
||||
# Replay Fidelity (what, why, how)
|
||||
|
||||
**What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
|
||||
|
||||
**Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
|
||||
|
||||
---
|
||||
|
||||
## The metric
|
||||
|
||||
* **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`.
|
||||
* **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`.
|
||||
* **Breakdown:** also track by scanner, language, image base, feed version, and environment.
|
||||
|
||||
---
|
||||
|
||||
## What must be captured in the scan manifest
|
||||
|
||||
* Exact source refs (image digest / repo SHA), container layers’ digests
|
||||
* Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
|
||||
* Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles
|
||||
* Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
|
||||
* Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
|
||||
|
||||
---
|
||||
|
||||
## Pass/Fail rules you can ship
|
||||
|
||||
* **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
|
||||
* **Warn:** Any bucket drops by ≥ 2% week‑over‑week
|
||||
* **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95
|
||||
|
||||
---
|
||||
|
||||
## Minimal replay harness (outline)
|
||||
|
||||
1. Pick N historical scans (e.g., last 200 or stratified by image language).
|
||||
2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds).
|
||||
3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
|
||||
4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
|
||||
5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).
|
||||
|
||||
---
|
||||
|
||||
## Dashboard (what to show)
|
||||
|
||||
* Fidelity % (30/90‑day) + sparkline
|
||||
* Top offenders (by language/scanner/policy set)
|
||||
* “Cause of mismatch” histogram (feed vs runtime vs code vs policy)
|
||||
* Click‑through: deterministic diff (e.g., which CVEs flipped and why)
|
||||
|
||||
---
|
||||
|
||||
## Quick wins for Stella Ops
|
||||
|
||||
* Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan.
|
||||
* Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network).
|
||||
* Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
|
||||
* Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays.
|
||||
* Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`.
|
||||
|
||||
Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres?
|
||||
Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.
|
||||
|
||||
---
|
||||
|
||||
## 1) Extend the concept: Replay Fidelity as a product capability
|
||||
|
||||
### 1.1 Fidelity levels (so you can be strict without being brittle)
|
||||
|
||||
Instead of a single yes/no, define **tiers** that you can report and gate on:
|
||||
|
||||
1. **Bitwise Fidelity (BF)**
|
||||
|
||||
* *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization.
|
||||
* *Use:* strongest auditability, catch ordering/nondeterminism.
|
||||
|
||||
2. **Semantic Fidelity (SF)**
|
||||
|
||||
* *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps).
|
||||
* *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
|
||||
* *Use:* protects you from “cosmetic diffs” and helps triage.
|
||||
|
||||
3. **Policy Fidelity (PF)**
|
||||
|
||||
* *Definition:* Final policy decision (pass/fail + reason codes) matches.
|
||||
* *Use:* useful when outputs may evolve but governance outcome must remain stable.
|
||||
|
||||
**Recommended reporting:**
|
||||
|
||||
* Dashboard shows BF, SF, PF together.
|
||||
* Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally.
|
||||
|
||||
---
|
||||
|
||||
### 1.2 “Why did it drift?”—Mismatch classification taxonomy
|
||||
|
||||
When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.
|
||||
|
||||
**Primary mismatch classes**
|
||||
|
||||
* **Feed drift:** CVE/OVAL/vendor advisory snapshot differs.
|
||||
* **Policy drift:** policy/lattice/rules differ (or default rule set changed).
|
||||
* **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences.
|
||||
* **Scanner drift:** scanner binary build differs or dependency versions changed.
|
||||
* **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic.
|
||||
* **External IO:** network calls, “latest” resolution, remote package registry changes.
|
||||
|
||||
**Output:** a `mismatch_reason` plus a short `diff_summary`.
|
||||
|
||||
---
|
||||
|
||||
### 1.3 Deterministic “scan envelope” design
|
||||
|
||||
A replay only works if the scan is fully specified.
|
||||
|
||||
**Scan envelope components**
|
||||
|
||||
* **Inputs:** image digest, repo commit SHA, build provenance, layers digests.
|
||||
* **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles.
|
||||
* **Feeds:** content-addressed feed bundle digests (see §2.3).
|
||||
* **Policy/rules:** git commit SHA + content digest of compiled rules.
|
||||
* **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count.
|
||||
* **Normalization:** “canonicalization version” for SBOM/VEX/findings.
|
||||
|
||||
---
|
||||
|
||||
### 1.4 Canonicalization so “bitwise” is meaningful
|
||||
|
||||
To make BF achievable:
|
||||
|
||||
* Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
|
||||
* Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
|
||||
* Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)`
|
||||
* Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
|
||||
|
||||
---
|
||||
|
||||
### 1.5 Sampling strategy
|
||||
|
||||
You don’t need to replay everything.
|
||||
|
||||
**Nightly sample:** stratified by:
|
||||
|
||||
* language ecosystem (npm, pip, maven, go, rust…)
|
||||
* scanner engine
|
||||
* base OS
|
||||
* “regulatory tier”
|
||||
* image size/complexity
|
||||
|
||||
**Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
|
||||
|
||||
---
|
||||
|
||||
## 2) Technical architecture blueprint
|
||||
|
||||
### 2.1 System components
|
||||
|
||||
1. **Manifest Writer (in the scan pipeline)**
|
||||
|
||||
* Produces `ScanManifest v1` JSON
|
||||
* Records all digests and versions
|
||||
|
||||
2. **Artifact Store**
|
||||
|
||||
* Stores SBOM, findings, VEX, evidence blobs
|
||||
* Stores canonical hashes for BF checks
|
||||
|
||||
3. **Feed Snapshotter**
|
||||
|
||||
* Periodically builds immutable feed bundles
|
||||
* Content-addressed (digest-keyed)
|
||||
* Stores metadata (source URLs, generation timestamp, signature)
|
||||
|
||||
4. **Replay Orchestrator**
|
||||
|
||||
* Chooses historical scans to replay
|
||||
* Launches “replay executor” jobs
|
||||
|
||||
5. **Replay Executor**
|
||||
|
||||
* Runs scanner in pinned container image
|
||||
* Network off, tz fixed, clock policy applied
|
||||
* Produces new artifacts + hashes
|
||||
|
||||
6. **Diff & Scoring Engine**
|
||||
|
||||
* Computes BF/SF/PF
|
||||
* Generates mismatch classification + diff summary
|
||||
|
||||
7. **Metrics + UI Dashboard**
|
||||
|
||||
* Prometheus metrics
|
||||
* UI for drill-down diffs
|
||||
|
||||
---
|
||||
|
||||
### 2.2 Data model (Postgres-friendly)
|
||||
|
||||
**Core tables**
|
||||
|
||||
* `scan_manifests`
|
||||
|
||||
* `scan_id (pk)`
|
||||
* `manifest_json`
|
||||
* `manifest_sha256`
|
||||
* `created_at`
|
||||
* `scan_artifacts`
|
||||
|
||||
* `scan_id (fk)`
|
||||
* `artifact_type` (sbom|findings|vex|evidence)
|
||||
* `artifact_uri`
|
||||
* `canonical_sha256`
|
||||
* `schema_version`
|
||||
* `feed_snapshots`
|
||||
|
||||
* `feed_digest (pk)`
|
||||
* `bundle_uri`
|
||||
* `sources_json`
|
||||
* `generated_at`
|
||||
* `signature`
|
||||
* `replay_runs`
|
||||
|
||||
* `replay_id (pk)`
|
||||
* `original_scan_id (fk)`
|
||||
* `status` (queued|running|passed|failed)
|
||||
* `bf_match bool`, `sf_match bool`, `pf_match bool`
|
||||
* `mismatch_reason`
|
||||
* `diff_summary_json`
|
||||
* `started_at`, `finished_at`
|
||||
* `executor_env_json` (arch, tz, cpu, image digest)
|
||||
|
||||
**Indexes**
|
||||
|
||||
* `(created_at)` for sampling windows
|
||||
* `(mismatch_reason, finished_at)` for triage
|
||||
* `(scanner_version, ecosystem)` for breakdown dashboards
|
||||
|
||||
---
|
||||
|
||||
### 2.3 Feed Snapshotting (the key to long-term replay)
|
||||
|
||||
**Feed bundle format**
|
||||
|
||||
* `feeds/<source>/<date>/...` inside a tar.zst
|
||||
* manifest file inside bundle: `feed_bundle_manifest.json` containing:
|
||||
|
||||
* source URLs
|
||||
* retrieval commit/etag (if any)
|
||||
* file hashes
|
||||
* generated_by version
|
||||
|
||||
**Content addressing**
|
||||
|
||||
* Digest of the entire bundle (`sha256(tar.zst)`) is the reference.
|
||||
* Scans record only the digest + URI.
|
||||
|
||||
**Immutability**
|
||||
|
||||
* Store bundles in object storage with WORM / retention if you need compliance.
|
||||
|
||||
---
|
||||
|
||||
### 2.4 Replay execution sandbox
|
||||
|
||||
For determinism, enforce:
|
||||
|
||||
* **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags)
|
||||
* **Fixed TZ/locale**
|
||||
* **Pinned container image digest**
|
||||
* **Clock policy**
|
||||
|
||||
* Either “real time but recorded” or “frozen time at original scan timestamp”
|
||||
* If scanner logic uses current date for severity windows, freeze time
|
||||
|
||||
---
|
||||
|
||||
## 3) Development implementation plan
|
||||
|
||||
I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size.
|
||||
|
||||
### Workstream A — Scan Manifest & Canonical Artifacts
|
||||
|
||||
**Goal:** every scan is replayable on paper, even before replays run.
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* `ScanManifest v1` schema + writer integrated into scan pipeline
|
||||
* Canonicalization library + canonical hashing for all artifacts
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
|
||||
* Artifact hashes are stable across repeated runs in the same environment
|
||||
|
||||
---
|
||||
|
||||
### Workstream B — Feed Snapshotting & Policy Versioning
|
||||
|
||||
**Goal:** eliminate “feed drift” by pinning immutable inputs.
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Feed bundle builder + signer + uploader
|
||||
* Policy/rules bundler (compiled rules bundle, digest recorded)
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* New scans reference feed bundle digests (not “latest”)
|
||||
* A scan can be re-run with the same feed bundle and policy bundle
|
||||
|
||||
---
|
||||
|
||||
### Workstream C — Replay Runner & Diff Engine
|
||||
|
||||
**Goal:** execute historical scans and score BF/SF/PF with actionable diffs.
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* `stella replay --from manifest.json`
|
||||
* Orchestrator job to schedule replays
|
||||
* Diff engine + mismatch classifier
|
||||
* Storage of replay results
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Replay produces deterministic artifacts in a pinned environment
|
||||
* Dashboard/CLI shows BF/SF/PF + diff summary for failures
|
||||
|
||||
---
|
||||
|
||||
### Workstream D — Observability, Dashboard, and CI Gates
|
||||
|
||||
**Goal:** make fidelity visible and enforceable.
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf`
|
||||
* Breakdown labels (scanner, ecosystem, policy_set, base_os)
|
||||
* Alerts for drop thresholds
|
||||
* CI gate option: “block release if BF < threshold on canary set”
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Engineering can see drift within 24h
|
||||
* Releases are blocked when fidelity regressions occur
|
||||
|
||||
---
|
||||
|
||||
## 4) Suggested sprint plan with concrete tasks
|
||||
|
||||
### Sprint 0 — Design lock + baseline
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Define manifest schema: `ScanManifest v1` fields + versioning rules
|
||||
* Decide canonicalization rules (what is normalized vs preserved)
|
||||
* Choose initial “golden canary” scan set (10–20 representative targets)
|
||||
* Add “replay-fidelity” epic with ownership & SLIs/SLOs
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Approved schema + canonicalization spec
|
||||
* Canary set stored and tagged
|
||||
|
||||
---
|
||||
|
||||
### Sprint 1 — Manifest writer + artifact hashing (MVP)
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Implement manifest writer in scan pipeline
|
||||
* Store `manifest_json` + `manifest_sha256`
|
||||
* Implement canonicalization + hashing for:
|
||||
|
||||
* findings list (sorted)
|
||||
* SBOM (normalized)
|
||||
* VEX (if present)
|
||||
* Persist canonical hashes in `scan_artifacts`
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Two identical scans in the same environment yield identical artifact hashes
|
||||
* A “manifest export” endpoint/CLI works:
|
||||
|
||||
* `stella scan --emit-manifest out.json`
|
||||
|
||||
---
|
||||
|
||||
### Sprint 2 — Feed snapshotter + policy bundling
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Build feed bundler job:
|
||||
|
||||
* pull raw sources
|
||||
* normalize layout
|
||||
* generate `feed_bundle_manifest.json`
|
||||
* tar.zst + sha256
|
||||
* upload + record in `feed_snapshots`
|
||||
* Update scan pipeline:
|
||||
|
||||
* resolve feed bundle digest at scan start
|
||||
* record digest in scan manifest
|
||||
* Bundle policy/lattice:
|
||||
|
||||
* compile rules into an immutable artifact
|
||||
* record policy bundle digest in manifest
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Scans reference immutable feed + policy digests
|
||||
* You can fetch feed bundle by digest and reproduce the same feed inputs
|
||||
|
||||
---
|
||||
|
||||
### Sprint 3 — Replay executor + “no network” sandbox
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Create replay container image / runtime wrapper
|
||||
* Implement `stella replay --from MANIFEST.json`
|
||||
|
||||
* pulls scanner image by digest
|
||||
* mounts feed bundle + policy bundle
|
||||
* runs in network-off mode
|
||||
* applies tz/locale + clock mode
|
||||
* Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage)
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Replay runs end-to-end for canary scans
|
||||
* Deterministic runtime controls verified (no DNS egress, fixed tz)
|
||||
|
||||
---
|
||||
|
||||
### Sprint 4 — Diff engine + mismatch classification
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Implement BF compare (canonical hashes)
|
||||
* Implement SF compare (semantic JSON/object comparison)
|
||||
* Implement PF compare (policy decision equivalence)
|
||||
* Implement mismatch classification rules:
|
||||
|
||||
* if feed digest differs → feed drift
|
||||
* if scanner digest differs → scanner drift
|
||||
* if environment differs → runtime drift
|
||||
* else → nondeterminism (with sub-tags for ordering/time/RNG)
|
||||
* Generate `diff_summary_json`:
|
||||
|
||||
* top N changed CVEs
|
||||
* packages added/removed
|
||||
* policy verdict changes
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
|
||||
* Engineers can reproduce failures locally with the manifest
|
||||
|
||||
---
|
||||
|
||||
### Sprint 5 — Dashboard + alerts + CI gate
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Expose Prometheus metrics from replay service
|
||||
* Build dashboard:
|
||||
|
||||
* BF/SF/PF trends
|
||||
* breakdown by ecosystem/scanner/policy
|
||||
* mismatch cause histogram
|
||||
* Add alerting rules (drop threshold, bucket regression)
|
||||
* Add CI gate mode:
|
||||
|
||||
* “run replays on canary set for this release candidate”
|
||||
* block merge if BF < target
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Fidelity visible to leadership and engineering
|
||||
* Release process is protected by canary replays
|
||||
|
||||
---
|
||||
|
||||
### Sprint 6 — Hardening + compliance polish
|
||||
|
||||
**Tasks**
|
||||
|
||||
* Backward compatible manifest upgrades:
|
||||
|
||||
* `manifest_version` bump rules
|
||||
* migration support
|
||||
* Artifact signing / integrity:
|
||||
|
||||
* sign manifest hash
|
||||
* optional transparency log later
|
||||
* Storage & retention policies (cost controls)
|
||||
* Runbook + oncall playbook
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Audit story is complete: “show me exactly how scan X was produced”
|
||||
* Operational load is manageable and cost-bounded
|
||||
|
||||
---
|
||||
|
||||
## 5) Engineering specs you can start implementing immediately
|
||||
|
||||
### 5.1 `ScanManifest v1` skeleton (example)
|
||||
|
||||
```json
|
||||
{
|
||||
"manifest_version": "1.0",
|
||||
"scan_id": "scan_123",
|
||||
"created_at": "2025-12-12T10:15:30Z",
|
||||
|
||||
"input": {
|
||||
"type": "oci_image",
|
||||
"image_ref": "registry/app@sha256:...",
|
||||
"layers": ["sha256:...", "sha256:..."],
|
||||
"source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
|
||||
},
|
||||
|
||||
"scanner": {
|
||||
"engine": "stella",
|
||||
"scanner_image_digest": "sha256:...",
|
||||
"scanner_version": "2025.12.0",
|
||||
"config_digest": "sha256:...",
|
||||
"flags": ["--deep", "--vex"]
|
||||
},
|
||||
|
||||
"feeds": {
|
||||
"vuln_feed_bundle_digest": "sha256:...",
|
||||
"license_db_digest": "sha256:..."
|
||||
},
|
||||
|
||||
"policy": {
|
||||
"policy_bundle_digest": "sha256:...",
|
||||
"policy_set": "prod-default"
|
||||
},
|
||||
|
||||
"environment": {
|
||||
"arch": "amd64",
|
||||
"os": "linux",
|
||||
"tz": "UTC",
|
||||
"locale": "C",
|
||||
"network": "disabled",
|
||||
"clock_mode": "frozen",
|
||||
"clock_value": "2025-12-12T10:15:30Z"
|
||||
},
|
||||
|
||||
"normalization": {
|
||||
"canonicalizer_version": "1.2.0",
|
||||
"sbom_schema": "cyclonedx-1.6",
|
||||
"vex_schema": "cyclonedx-vex-1.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5.2 CLI spec (minimal)
|
||||
|
||||
* `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/`
|
||||
* `stella replay --from MANIFEST.json --out-dir replay_out/`
|
||||
* `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json`
|
||||
|
||||
---
|
||||
|
||||
## 6) Testing strategy (to prevent determinism regressions)
|
||||
|
||||
### Unit tests
|
||||
|
||||
* Canonicalization: same object → same bytes
|
||||
* Sorting stability: randomized input order → stable output
|
||||
* Hash determinism
|
||||
|
||||
### Integration tests
|
||||
|
||||
* Golden canaries:
|
||||
|
||||
* run scan twice in same runner → BF match
|
||||
* replay from manifest → BF match
|
||||
* “Network leak” test:
|
||||
|
||||
* DNS requests must be zero
|
||||
* “Clock leak” test:
|
||||
|
||||
* freeze time; ensure outputs do not include real timestamps
|
||||
|
||||
### Chaos tests
|
||||
|
||||
* Vary CPU count, run concurrency, run order → still BF match
|
||||
* Randomized scheduling / thread interleavings to find races
|
||||
|
||||
---
|
||||
|
||||
## 7) Operational policies (so it stays useful)
|
||||
|
||||
### Retention & cost controls
|
||||
|
||||
* Keep full artifacts for regulated scans (e.g., 1–7 years)
|
||||
* For non-regulated:
|
||||
|
||||
* keep manifests + canonical hashes long-term
|
||||
* expire heavy evidence blobs after N days
|
||||
* Compress large artifacts and dedupe by digest
|
||||
|
||||
### Alerting examples
|
||||
|
||||
* BF drops by ≥2% week-over-week (any major bucket) → warn
|
||||
* BF < 0.90 overall or regulated BF < 0.95 → page / block release
|
||||
|
||||
### Triage workflow
|
||||
|
||||
* Failed replay auto-creates a ticket with:
|
||||
|
||||
* manifest link
|
||||
* mismatch_reason
|
||||
* diff_summary
|
||||
* reproduction command
|
||||
|
||||
---
|
||||
|
||||
## 8) What “done” looks like (definition of success)
|
||||
|
||||
* Any customer/auditor can pick a scan from 6 months ago and you can:
|
||||
|
||||
1. retrieve manifest + feed bundle + policy bundle by digest
|
||||
2. replay in a pinned sandbox
|
||||
3. show BF/SF/PF results and diffs
|
||||
* Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also provide:
|
||||
|
||||
* a **Postgres DDL** for the tables above,
|
||||
* a **Prometheus metrics contract** (names + labels + example queries),
|
||||
* and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts.
|
||||
@@ -0,0 +1,840 @@
|
||||
Here’s a quick, plain‑English idea you can use right away: **not all code diffs are equal**—some actually change what’s *reachable* at runtime (and thus security posture), while others just refactor internals. A “**Smart‑Diff**” pipeline flags only the diffs that open or close attack paths by combining (1) call‑stack traces, (2) dependency graphs, and (3) dataflow.
|
||||
|
||||
---
|
||||
|
||||
### Why this matters (background)
|
||||
|
||||
* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
|
||||
* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
|
||||
* **Dataflow‑aware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policy‑defined entrypoint.”
|
||||
|
||||
---
|
||||
|
||||
### Minimal architecture (fits Stella Ops)
|
||||
|
||||
1. **Entrypoint map** (per service): controllers, handlers, consumers.
|
||||
2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
|
||||
3. **Reachability cache** keyed by (commit, entrypoint, package@version).
|
||||
4. **Smart‑Diff** = `reachable_paths(commit_B) – reachable_paths(commit_A)`.
|
||||
|
||||
* If a path to a sensitive sink is newly reachable → **High**.
|
||||
* If a path disappears → auto‑generate **VEX “not affected (no reachable path)”**.
|
||||
|
||||
---
|
||||
|
||||
### Tiny working seeds
|
||||
|
||||
**C# (.NET 10) — Roslyn skeleton to diff call‑reachability**
|
||||
|
||||
```csharp
|
||||
// SmartDiff.csproj targets net10.0
|
||||
using Microsoft.CodeAnalysis;
|
||||
using Microsoft.CodeAnalysis.CSharp;
|
||||
using Microsoft.CodeAnalysis.FindSymbols;
|
||||
|
||||
public static class SmartDiff
|
||||
{
|
||||
public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
|
||||
{
|
||||
var workspace = MSBuild.MSBuildWorkspace.Create();
|
||||
var solution = await workspace.OpenSolutionAsync(solutionPath);
|
||||
var index = new HashSet<string>();
|
||||
|
||||
foreach (var proj in solution.Projects)
|
||||
{
|
||||
var comp = await proj.GetCompilationAsync();
|
||||
if (comp is null) continue;
|
||||
|
||||
// Resolve entrypoints & sinks by symbol name
|
||||
var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
|
||||
.OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
|
||||
var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
|
||||
.OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();
|
||||
|
||||
foreach (var ep in epSymbols)
|
||||
foreach (var sink in sinkSymbols)
|
||||
{
|
||||
// Heuristic reachability: cheap path search via SymbolFinder
|
||||
var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
|
||||
if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
|
||||
index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
|
||||
}
|
||||
}
|
||||
return index;
|
||||
|
||||
static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
|
||||
{
|
||||
foreach (var m in sym.GetMembers())
|
||||
{
|
||||
yield return m;
|
||||
if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Go — SSA & callgraph seed**
|
||||
|
||||
```go
|
||||
// go.mod: require golang.org/x/tools latest
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"golang.org/x/tools/go/callgraph/cha"
|
||||
"golang.org/x/tools/go/packages"
|
||||
"golang.org/x/tools/go/ssa"
|
||||
)
|
||||
|
||||
func main() {
|
||||
cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
|
||||
pkgs, _ := packages.Load(cfg, "./...")
|
||||
prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
|
||||
for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
|
||||
prog.Build()
|
||||
|
||||
cg := cha.CallGraph(prog)
|
||||
// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
|
||||
fmt.Println("nodes:", len(cg.Nodes))
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### How to use it in your pipeline (fast win)
|
||||
|
||||
* **Pre‑merge job**:
|
||||
|
||||
1. Build call graph for `HEAD` and `HEAD^`.
|
||||
2. Compute Smart‑Diff.
|
||||
3. If any *new* EP→sink path appears, fail with a short, proof‑linked note:
|
||||
“New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
|
||||
* **Post‑scan VEX**:
|
||||
|
||||
* For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that package’s vulnerable surface.
|
||||
|
||||
---
|
||||
|
||||
### Evidence to show in the UI
|
||||
|
||||
* “**Path card**”: EP → … → Sink, with file:line hop‑list and commit hash.
|
||||
* “**What changed**”: before/after path diff (green removed, red added).
|
||||
* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).
|
||||
|
||||
---
|
||||
|
||||
### Developer checklist (Stella Ops style)
|
||||
|
||||
* [ ] Define entrypoints per service (attribute or YAML).
|
||||
* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
|
||||
* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
|
||||
* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
|
||||
* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
|
||||
* [ ] Gate merges on **newly‑reachable sensitive sinks**; auto‑VEX when paths disappear.
|
||||
|
||||
If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
|
||||
Below is a concrete **development implementation plan** to take the “Smart‑Diff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). I’ll assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).
|
||||
|
||||
---
|
||||
|
||||
## 1) Product definition
|
||||
|
||||
### Problem you’re solving
|
||||
|
||||
Security noise comes from:
|
||||
|
||||
* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
|
||||
* Git diffs look big even when behavior is unchanged
|
||||
* Teams struggle to triage “is this change actually risky?”
|
||||
|
||||
### What Smart‑Diff should do (core behavior)
|
||||
|
||||
Given **base commit A** and **head commit B**:
|
||||
|
||||
1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
|
||||
2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
|
||||
3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
|
||||
4. Emit **Smart‑Diff**:
|
||||
|
||||
* **Newly reachable** EP→sink paths (risk ↑)
|
||||
* **Removed** EP→sink paths (risk ↓)
|
||||
* **Changed** paths (same sink but different sanitization/guards)
|
||||
5. Attach **dependency vulnerability context**:
|
||||
|
||||
* If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
|
||||
* Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence
|
||||
|
||||
### MVP definition (minimum shippable)
|
||||
|
||||
A PR check that:
|
||||
|
||||
* Flags **new** reachable paths to a small set of high‑risk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
|
||||
* Produces:
|
||||
|
||||
* SARIF report (for code scanning UI)
|
||||
* JSON artifact containing proof paths (EP → … → sink with file:line)
|
||||
* Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)
|
||||
|
||||
---
|
||||
|
||||
## 2) Architecture you can actually build
|
||||
|
||||
### High‑level components
|
||||
|
||||
1. **Policy & Taxonomy Service**
|
||||
|
||||
* Defines entrypoints, sources, sinks, sanitizers, confidence rules
|
||||
* Versioned and centrally managed (but supports repo overrides)
|
||||
|
||||
2. **Analyzer Workers (language adapters)**
|
||||
|
||||
* .NET analyzer (Roslyn + control flow)
|
||||
* Go analyzer (SSA + callgraph)
|
||||
* Outputs standardized IR (Intermediate Representation)
|
||||
|
||||
3. **Graph Store + Reachability Engine**
|
||||
|
||||
* Stores symbol nodes + call edges + dataflow edges
|
||||
* Computes reachable sinks per entrypoint
|
||||
* Computes diff between commits A and B
|
||||
|
||||
4. **Vulnerability Mapper + VEX Generator**
|
||||
|
||||
* Maps vulnerable packages/functions → “surfaces”
|
||||
* Joins with reachability results
|
||||
* Emits OpenVEX (or CycloneDX VEX) with evidence links
|
||||
|
||||
5. **CI/PR Integrations**
|
||||
|
||||
* CLI that runs in CI
|
||||
* Optional server mode (cache + incremental processing)
|
||||
|
||||
6. **UI/API**
|
||||
|
||||
* Path cards: “what changed”, “why it matters”, “proof”
|
||||
* Filters by sink class, confidence, service, entrypoint
|
||||
|
||||
### Data contracts (standardized IR)
|
||||
|
||||
Make every analyzer output the same shapes so the rest of the pipeline is language‑agnostic:
|
||||
|
||||
* **Symbols**
|
||||
|
||||
* `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
|
||||
* metadata: file, line ranges, kind (method/function), accessibility
|
||||
|
||||
* **Edges**
|
||||
|
||||
* Call edge: `caller_symbol_id -> callee_symbol_id`
|
||||
* Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
|
||||
* Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)
|
||||
|
||||
* **Entrypoints / Sources / Sinks**
|
||||
|
||||
* entrypoint: (symbol_id, route/topic/command metadata)
|
||||
* sink: (symbol_id, sink_type, severity, cwe mapping optional)
|
||||
|
||||
* **Paths**
|
||||
|
||||
* `entrypoint -> ... -> sink`
|
||||
* hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant
|
||||
|
||||
---
|
||||
|
||||
## 3) Workstreams and deliverables
|
||||
|
||||
### Workstream A — Policy, taxonomy, configuration
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* `smartdiff.policy.yaml` schema and validator
|
||||
* A default sink taxonomy:
|
||||
|
||||
* `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
|
||||
* Initial sanitizer patterns:
|
||||
|
||||
* For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists
|
||||
|
||||
**Implementation notes**
|
||||
|
||||
* Start strict and small: 10–20 sinks, 10 sources, 10 sanitizers.
|
||||
* Provide repo-level overrides:
|
||||
|
||||
* `smartdiff.policy.yaml` in repo root
|
||||
* Central policies referenced by version tag
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* A service can onboard by configuring:
|
||||
|
||||
* entrypoint discovery mode (auto + manual)
|
||||
* sink classes to enforce
|
||||
* severity threshold to fail PR
|
||||
|
||||
---
|
||||
|
||||
### Workstream B — .NET analyzer (Roslyn)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Build pipeline that produces:
|
||||
|
||||
* call graph (methods and invocations)
|
||||
* basic control-flow guards for reachability (optional for MVP)
|
||||
* taint propagation for common patterns (MVP: parameter → sink)
|
||||
* Entry point discovery for:
|
||||
|
||||
* ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
|
||||
* Minimal APIs (`MapGet/MapPost`)
|
||||
* gRPC service methods
|
||||
* message consumers (configurable attributes/interfaces)
|
||||
|
||||
**Implementation notes (practical path)**
|
||||
|
||||
* MVP static callgraph:
|
||||
|
||||
* Use Roslyn semantic model to resolve invocation targets
|
||||
* For virtual/interface calls: conservative resolution to possible implementations within the compilation
|
||||
* MVP taint:
|
||||
|
||||
* “Sources”: request params/body, headers, query string, message payloads
|
||||
* “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
|
||||
* Propagate taint across:
|
||||
|
||||
* parameter → local → argument
|
||||
* return values
|
||||
* simple assignments and concatenations (heuristic)
|
||||
* Confidence scoring:
|
||||
|
||||
* Direct static call resolution: high
|
||||
* Reflection/dynamic: low (flag separately)
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* On a demo ASP.NET service, if a PR adds:
|
||||
|
||||
* `HttpPost /upload` → `File.WriteAllBytes(userPath, ...)`
|
||||
Smart‑Diff flags **new EP→FILE_WRITE path** and shows hops with file/line.
|
||||
|
||||
---
|
||||
|
||||
### Workstream C — Go analyzer (SSA)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* SSA build + callgraph extraction
|
||||
* Entrypoint discovery for:
|
||||
|
||||
* `net/http` handlers
|
||||
* common routers (Gin/Echo/Chi) via adapter rules
|
||||
* gRPC methods
|
||||
* consumers (Kafka/NATS/etc.) by config
|
||||
|
||||
**Implementation notes**
|
||||
|
||||
* Use `golang.org/x/tools/go/packages` + `ssa` build
|
||||
* Callgraph:
|
||||
|
||||
* start with CHA (Class Hierarchy Analysis) for speed
|
||||
* later add pointer analysis for precision on interfaces
|
||||
* Taint:
|
||||
|
||||
* sources: `http.Request`, router params, message payloads
|
||||
* sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.
|
||||
|
||||
---
|
||||
|
||||
### Workstream D — Graph store + reachability computation
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Schema in Postgres (recommended first) for:
|
||||
|
||||
* commits, services, languages
|
||||
* symbols, edges, entrypoints, sinks
|
||||
* computed reachable “facts” (entrypoint→sink with shortest path(s))
|
||||
* Reachability engine:
|
||||
|
||||
* BFS/DFS per entrypoint with early cutoffs
|
||||
* path reconstruction storage (store predecessor map or store k-shortest paths)
|
||||
|
||||
**Implementation notes**
|
||||
|
||||
* Don’t start with a graph DB unless you must.
|
||||
* Use Postgres tables + indexes:
|
||||
|
||||
* `edges(from_symbol, to_symbol, commit_id, kind)`
|
||||
* `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
|
||||
* `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
|
||||
* Cache:
|
||||
|
||||
* keyed by (commit, policy_version, analyzer_version)
|
||||
* avoids recompute on re-runs
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* For any analyzed commit, you can answer:
|
||||
|
||||
* “Which sinks are reachable from these entrypoints?”
|
||||
* “Show me one proof path per (entrypoint, sink_type).”
|
||||
|
||||
---
|
||||
|
||||
### Workstream E — Smart‑Diff engine (the “diff” part)
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Diff algorithm producing three buckets:
|
||||
|
||||
* `added_paths`, `removed_paths`, `changed_paths`
|
||||
* “Changed” means:
|
||||
|
||||
* same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes
|
||||
|
||||
**Implementation notes**
|
||||
|
||||
* Identify a path by a stable fingerprint:
|
||||
|
||||
* `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
|
||||
* Store:
|
||||
|
||||
* top-k paths for each pair for evidence (k=1 for MVP, add more later)
|
||||
* Severity gating rules:
|
||||
|
||||
* Example:
|
||||
|
||||
* New path to `CMD_EXEC` = fail
|
||||
* New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
|
||||
* New path to `SQL_RAW` = fail unless parameterized sanitizer present
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Given commits A and B:
|
||||
|
||||
* If B introduces a new reachable sink, CI fails with a single actionable card:
|
||||
|
||||
* **EP**: route / handler
|
||||
* **Sink**: type + symbol
|
||||
* **Proof**: hop list
|
||||
* **Why**: policy rule triggered
|
||||
|
||||
---
|
||||
|
||||
### Workstream F — Vulnerability mapping + VEX
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* Ingest dependency inventory (SBOM or lockfiles)
|
||||
* Map vulnerabilities to “surfaces”
|
||||
|
||||
* package → vulnerable module/function patterns
|
||||
* minimal version/range matching (from your existing vuln feed)
|
||||
* Decision logic:
|
||||
|
||||
* **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
|
||||
* else **Not affected / Not exploitable** with justification
|
||||
|
||||
**Implementation notes**
|
||||
|
||||
* Start with a pragmatic approach:
|
||||
|
||||
* package‑level reachability: “is any symbol in that package reachable?”
|
||||
* then iterate toward function‑level surfaces
|
||||
* VEX output:
|
||||
|
||||
* include commit hash, policy version, evidence paths
|
||||
* embed links to internal “path card” URLs if available
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* For a known vulnerable dependency, the system emits:
|
||||
|
||||
* VEX “not affected” if package code is never reached from any entrypoint, with proof references.
|
||||
|
||||
---
|
||||
|
||||
### Workstream G — CI integration + developer UX
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* A single CLI:
|
||||
|
||||
* `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
|
||||
* `smartdiff diff --base <shaA> --head <shaB> --out sarif`
|
||||
* CI templates for:
|
||||
|
||||
* GitHub Actions / GitLab CI
|
||||
* Outputs:
|
||||
|
||||
* SARIF
|
||||
* JSON evidence bundle
|
||||
* optional OpenVEX file
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* Teams can enable Smart‑Diff by adding:
|
||||
|
||||
* CI job + config file
|
||||
* no additional infra required for MVP (local artifacts mode)
|
||||
* When infra is available, enable server caching mode for speed.
|
||||
|
||||
---
|
||||
|
||||
### Workstream H — UI “Path Cards”
|
||||
|
||||
**Deliverables**
|
||||
|
||||
* UI components:
|
||||
|
||||
* Path card list with filters (sink type, severity, confidence)
|
||||
* “What changed” diff view:
|
||||
|
||||
* red = added hops
|
||||
* green = removed hops
|
||||
* “Evidence” panel:
|
||||
|
||||
* file:line for each hop
|
||||
* code snippets (optional)
|
||||
* APIs:
|
||||
|
||||
* `GET /smartdiff/{repo}/{pr}/findings`
|
||||
* `GET /smartdiff/{repo}/{commit}/path/{path_id}`
|
||||
|
||||
**Acceptance criteria**
|
||||
|
||||
* A developer can click one finding and understand:
|
||||
|
||||
* how the data got there
|
||||
* exactly what line introduced the risk
|
||||
* how to fix (sanitize/guard/allowlist)
|
||||
|
||||
---
|
||||
|
||||
## 4) Milestone plan (sequenced, no time promises)
|
||||
|
||||
### Milestone 0 — Foundation
|
||||
|
||||
* Repo scaffolding:
|
||||
|
||||
* `smartdiff-cli/`
|
||||
* `analyzers/dotnet/`
|
||||
* `analyzers/go/`
|
||||
* `core-ir/` (schemas + validation)
|
||||
* `server/` (optional; can come later)
|
||||
* Define IR JSON schema + versioning rules
|
||||
* Implement policy YAML + validator + sample policies
|
||||
* Implement “local mode” artifact output
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.
|
||||
|
||||
---
|
||||
|
||||
### Milestone 1 — Callgraph reachability MVP
|
||||
|
||||
* .NET: build call edges + entrypoint discovery (basic)
|
||||
* Go: build call edges + entrypoint discovery (basic)
|
||||
* Graph store: in-memory or local sqlite/postgres
|
||||
* Compute reachable sinks (callgraph only, no taint)
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* On a demo repo, you can list:
|
||||
|
||||
* entrypoints
|
||||
* reachable sinks (callgraph reachability only)
|
||||
* a proof path (hop list)
|
||||
|
||||
---
|
||||
|
||||
### Milestone 2 — Smart‑Diff MVP (PR gating)
|
||||
|
||||
* Compute diff between base/head reachable sink sets
|
||||
* Produce SARIF with:
|
||||
|
||||
* rule id = sink type
|
||||
* message includes entrypoint + sink + link to evidence JSON
|
||||
* CI templates + documentation
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* In PR checks, the job fails on new EP→sink paths and links to a proof.
|
||||
|
||||
---
|
||||
|
||||
### Milestone 3 — Taint/dataflow MVP (high-value sinks only)
|
||||
|
||||
* Add taint propagation to reduce false positives:
|
||||
|
||||
* differentiate “sink reachable” vs “untrusted data reaches sink”
|
||||
* Add sanitizer recognition
|
||||
* Add confidence scoring + suppression mechanisms (policy allowlists)
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).
|
||||
|
||||
---
|
||||
|
||||
### Milestone 4 — VEX integration MVP
|
||||
|
||||
* Join reachability with dependency vulnerabilities
|
||||
* Emit OpenVEX (and/or CycloneDX VEX)
|
||||
* Store evidence references (paths) inside VEX justification
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* For a repo with a vulnerable dependency, you can automatically produce:
|
||||
|
||||
* affected/not affected with evidence.
|
||||
|
||||
---
|
||||
|
||||
### Milestone 5 — Scale and precision improvements
|
||||
|
||||
* Incremental analysis (only analyze changed projects/packages)
|
||||
* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
|
||||
* Optional runtime telemetry integration:
|
||||
|
||||
* import production traces to prioritize “actually observed” entrypoints
|
||||
|
||||
**Exit criteria**
|
||||
|
||||
* Works on large services with acceptable run time and stable noise levels.
|
||||
|
||||
---
|
||||
|
||||
## 5) Backlog you can paste into Jira (epics + key stories)
|
||||
|
||||
### Epic: Policy & taxonomy
|
||||
|
||||
* Story: Define `smartdiff.policy.yaml` schema and validator
|
||||
**AC:** invalid configs fail with clear errors; configs are versioned.
|
||||
* Story: Provide default sink list and severities
|
||||
**AC:** at least 10 sink rules with test cases.
|
||||
|
||||
### Epic: .NET analyzer
|
||||
|
||||
* Story: Resolve method invocations to symbols (Roslyn)
|
||||
**AC:** correct targets for direct calls; conservative handling for virtual calls.
|
||||
* Story: Discover ASP.NET routes and bind to entrypoint symbols
|
||||
**AC:** entrypoints include route/method metadata.
|
||||
|
||||
### Epic: Go analyzer
|
||||
|
||||
* Story: SSA build and callgraph extraction
|
||||
**AC:** function nodes and edges generated for a multi-package repo.
|
||||
* Story: net/http entrypoint discovery
|
||||
**AC:** handler functions recognized as entrypoints with path labels.
|
||||
|
||||
### Epic: Reachability engine
|
||||
|
||||
* Story: Compute reachable sinks per entrypoint
|
||||
**AC:** store at least one path with hop list.
|
||||
* Story: Smart‑Diff A vs B
|
||||
**AC:** added/removed paths computed deterministically.
|
||||
|
||||
### Epic: CI/SARIF
|
||||
|
||||
* Story: Emit SARIF results
|
||||
**AC:** findings appear in code scanning UI; include file/line.
|
||||
|
||||
### Epic: Taint analysis
|
||||
|
||||
* Story: Propagate taint from request to sink for 3 sink classes
|
||||
**AC:** produces “tainted” evidence with a variable/argument trace.
|
||||
* Story: Sanitizer recognition
|
||||
**AC:** path marked “sanitized” and downgraded per policy.
|
||||
|
||||
### Epic: VEX
|
||||
|
||||
* Story: Generate OpenVEX statements from reachability + vuln feed
|
||||
**AC:** for “not affected” includes justification and evidence references.
|
||||
|
||||
---
|
||||
|
||||
## 6) Key engineering decisions (recommended defaults)
|
||||
|
||||
### Storage
|
||||
|
||||
* Start with **Postgres** (or even local sqlite for MVP) for simplicity.
|
||||
* Introduce a graph DB only if:
|
||||
|
||||
* you need very large multi-commit graph queries at low latency
|
||||
* Postgres performance becomes a hard blocker
|
||||
|
||||
### Confidence model
|
||||
|
||||
Every edge/path should carry:
|
||||
|
||||
* `confidence`: High/Med/Low
|
||||
* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
|
||||
This lets you:
|
||||
* gate only on high-confidence paths in early rollout
|
||||
* keep low-confidence as “informational”
|
||||
|
||||
### Suppression model
|
||||
|
||||
* Local suppressions:
|
||||
|
||||
* `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
|
||||
* Policy allowlists:
|
||||
|
||||
* allow file writes only under certain directories
|
||||
* allow outbound network only to configured domains
|
||||
|
||||
---
|
||||
|
||||
## 7) Testing strategy (to avoid “cool demo, unusable tool”)
|
||||
|
||||
### Unit tests
|
||||
|
||||
* Symbol hashing stability tests
|
||||
* Call resolution tests:
|
||||
|
||||
* overloads, generics, interfaces, lambdas
|
||||
* Policy parsing/validation tests
|
||||
|
||||
### Integration tests (must-have)
|
||||
|
||||
* Golden repos in `testdata/`:
|
||||
|
||||
* one ASP.NET minimal API
|
||||
* one MVC controller app
|
||||
* one Go net/http + one Gin app
|
||||
* Golden outputs:
|
||||
|
||||
* expected entrypoints
|
||||
* expected reachable sinks
|
||||
* expected diff between commits
|
||||
|
||||
### Regression tests
|
||||
|
||||
* A curated corpus of “known issues”:
|
||||
|
||||
* false positives you fixed should never return
|
||||
* false negatives: ensure known risky path is always found
|
||||
|
||||
### Performance tests
|
||||
|
||||
* Measure:
|
||||
|
||||
* analysis time per 50k LOC
|
||||
* memory peak
|
||||
* graph size
|
||||
* Budget enforcement:
|
||||
|
||||
* if over budget, degrade gracefully (lower precision, mark low confidence)
|
||||
|
||||
---
|
||||
|
||||
## 8) Example configs and outputs (to make onboarding easy)
|
||||
|
||||
### Example policy YAML (minimal)
|
||||
|
||||
```yaml
|
||||
version: 1
|
||||
service: invoices-api
|
||||
entrypoints:
|
||||
autodiscover:
|
||||
dotnet:
|
||||
aspnet: true
|
||||
go:
|
||||
net_http: true
|
||||
|
||||
sinks:
|
||||
- type: CMD_EXEC
|
||||
severity: high
|
||||
match:
|
||||
dotnet:
|
||||
symbols:
|
||||
- "System.Diagnostics.Process.Start(string)"
|
||||
go:
|
||||
symbols:
|
||||
- "os/exec.Command"
|
||||
- type: FILE_WRITE
|
||||
severity: medium
|
||||
match:
|
||||
dotnet:
|
||||
namespaces: ["System.IO"]
|
||||
go:
|
||||
symbols: ["os.WriteFile"]
|
||||
|
||||
gating:
|
||||
fail_on:
|
||||
- sink_type: CMD_EXEC
|
||||
when: "added && confidence >= medium"
|
||||
- sink_type: FILE_WRITE
|
||||
when: "added && tainted && confidence >= medium"
|
||||
```
|
||||
|
||||
### Evidence JSON shape (what the UI consumes)
|
||||
|
||||
```json
|
||||
{
|
||||
"commit": "abc123",
|
||||
"entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
|
||||
"sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
|
||||
"confidence": "high",
|
||||
"tainted": true,
|
||||
"path": [
|
||||
{"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
|
||||
{"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
|
||||
{"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9) Risks and mitigations (explicit)
|
||||
|
||||
1. **Dynamic behavior (reflection, DI, router magic)**
|
||||
|
||||
* Mitigation: conservative fallbacks + confidence labels + optional runtime traces later
|
||||
|
||||
2. **Noise from huge callgraphs**
|
||||
|
||||
* Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, k‑shortest paths only
|
||||
|
||||
3. **Large repo build failures**
|
||||
|
||||
* Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag
|
||||
|
||||
4. **Teams rejecting gating**
|
||||
|
||||
* Mitigation: staged rollout:
|
||||
|
||||
* Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER
|
||||
|
||||
---
|
||||
|
||||
## 10) Definition of done (what “implemented” means)
|
||||
|
||||
You should consider Smart‑Diff “implemented” when:
|
||||
|
||||
* A repo can enable it with one config + one CI job.
|
||||
* PRs get:
|
||||
|
||||
* a small number of **actionable** findings (not hundreds)
|
||||
* each finding has a proof path with file/line hops
|
||||
* It reliably detects at least:
|
||||
|
||||
* new command execution paths
|
||||
* new unsafe deserialization paths
|
||||
* new tainted filesystem write paths
|
||||
* It can optionally emit VEX decisions backed by reachability evidence.
|
||||
|
||||
---
|
||||
|
||||
If you want the next step, I can also give you:
|
||||
|
||||
* a **concrete repo layout** with module boundaries,
|
||||
* the **Postgres schema** (tables + indexes),
|
||||
* and a **language adapter interface** (so adding Java/Python later is straightforward).
|
||||
Reference in New Issue
Block a user