add advisories
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
2025-12-13 02:08:11 +02:00
parent 564df71bfb
commit d776e93b16
4 changed files with 2977 additions and 0 deletions

View File

@@ -0,0 +1,750 @@
Heres a simple, practical way to score vulnerabilities thats more auditable than plain CVSS: build a **deterministic score** from three reproducible inputs—**Reachability**, **Evidence**, and **Provenance**—so every number is explainable and replayable.
---
### Why move beyond CVSS?
* **CVSS is context-light**: it rarely knows *your* call paths, configs, or runtime.
* **Audits need proof**: regulators and customers increasingly ask, “show me how you got this score.”
* **Teams need consistency**: the same image should get the same score across environments when inputs are identical.
---
### The scoring idea (plain English)
Score = a weighted function of:
1. **Reachability Depth (R)** — how close the vulnerable function is to a real entry point in *your* app (e.g., public HTTP route → handler → library call).
2. **Evidence Density (E)** — how much concrete proof you have (stack traces, symbol hits, config toggles, feature flags, SCA vs. SAST vs. DAST vs. runtime).
3. **Provenance Integrity (P)** — how trustworthy the artifact chain is (signed SBOM, DSSE attestations, SLSA/Rekor entries, reproducible build match).
A compact, auditable formula you can start with:
```
NormalizedScore = W_R * f(R) + W_E * g(E) + W_P * h(P)
```
* Pick monotonic, bounded transforms (e.g., map to 0..1):
* f(R): inverse of hops (shorter path ⇒ higher value)
* g(E): weighted sum of evidence types (runtime>DAST>SAST>SCA, with decay for stale data)
* h(P): cryptographic/provenance checks (unsigned < signed < signed+attested < signed+attested+reproducible)
Keep **W_R + W_E + W_P = 1** (e.g., 0.5, 0.35, 0.15 for reachability-first triage).
---
### What makes this “deterministic”?
* Inputs are **machine-replayable**: call-graph JSON, evidence bundle (hashes + timestamps), provenance attestations.
* The score is **purely a function of those inputs**, so anyone can recompute it later and match your result byte-for-byte.
---
### Minimal rubric (ready to implement)
* **Reachability (R, 0..1)**
* 1.00 = vulnerable symbol called on a hot path from a public route (≤3 hops)
* 0.66 = reachable but behind uncommon feature flag or deep path (47 hops)
* 0.33 = only theoretically reachable (code present, no discovered path)
* 0.00 = dead/unreferenced code in this build
* **Evidence (E, 0..1)** (sum, capped at 1.0)
* +0.6 runtime trace hitting the symbol
* +0.3 DAST/integ test activating vulnerable behavior
* +0.2 SAST precise sink match
* +0.1 SCA presence only (no call evidence)
* (Apply 1030% decay if older than N days)
* **Provenance (P, 0..1)**
* 0.0 unsigned/unknown origin
* 0.3 signed image only
* 0.6 signed + SBOM (hash-linked)
* 1.0 signed + SBOM + DSSE attestations + reproducible build match
Example weights: `W_R=0.5, W_E=0.35, W_P=0.15`.
---
### How this plugs into **StellaOps**
* **Scanner** produces call-graphs & symbol maps (R).
* **Vexer**/Evidence store aggregates SCA/SAST/DAST/runtime proofs with timestamps (E).
* **Authority/ProofGraph** verifies signatures, SBOMimage hash links, DSSE/Rekor (P).
* **Policy Engine** applies the scoring formula (YAML policy) and emits a signed VEX note with the score + input hashes.
* **Replay**: any audit can re-run the same policy with the same inputs and get the same score.
---
### Developer checklist (do this first)
* Emit a **Reachability JSON** per build: entrypoints, hops, functions, edges, timestamps, hashes.
* Normalize **Evidence Types** with IDs, confidence, freshness, and content hashes.
* Record **Provenance Facts** (signing certs, SBOM digest, DSSE bundle, reproducible-build fingerprint).
* Implement the **score function as a pure library** (no I/O), version it (e.g., `score.v1`), and include the version + inputs hashes in every VEX note.
* Add a **30sec “TimetoEvidence” UI**: click a score see the exact call path, evidence list, and provenance checks.
---
### Why this helps compliance & sales
* Every number is **auditable** (inputs + function are transparent).
* Scores remain **consistent across airgapped sites** (deterministic, no hidden heuristics).
* You can **prove reduction** after a fix (paths disappear, evidence decays, provenance improves).
If you want, I can draft the YAML policy schema and a tiny .NET 10 library stub for `score.v1` so you can drop it into StellaOps today.
Below is an extended, **developer-ready implementation plan** to build the deterministic vulnerability score into **StellaOps** (Scanner Evidence/Vexer Authority/ProofGraph Policy Engine UI/VEX output). Im assuming a .NET-centric stack (since you mentioned .NET 10 earlier), but everything is laid out so the scoring core stays language-agnostic.
---
## 1) Extend the scoring model into a stable, “auditable primitive”
### 1.1 Outputs you should standardize on
Produce **two** signed artifacts per finding (plus optional UI views):
1. **ScoreResult** (primary):
* `riskScore` (0100 integer)
* `subscores` (each 0100 integer): `baseSeverity`, `reachability`, `evidence`, `provenance`
* `explain[]` (structured reasons, ordered deterministically)
* `inputs` (digests of all upstream inputs)
* `policy` (policy version + digest)
* `engine` (engine version + digest)
* `asOf` timestamp (the only time allowed to affect the result)
2. **VEX note** (OpenVEX/CSAF-compatible wrapper):
* references ScoreResult digest
* embeds the score (optional) + the input digests
* signed by StellaOps Authority
> Key audit requirement: anyone can recompute the score **offline** from the input bundle + policy + engine version.
---
## 2) Make determinism non-negotiable
### 2.1 Determinism rules (implement as “engineering constraints”)
These are the common ways deterministic systems become non-deterministic:
* **No floating point** in scoring math. Use integer basis points and integer bucket tables.
* **No implicit time**. Scoring takes `asOf` as an explicit input. Evidence freshness is computed as `asOf - evidence.timestamp`.
* **Canonical serialization** for hashing:
* Use RFC-style canonical JSON (e.g., JCS) or a strict canonical CBOR profile.
* Sort keys and arrays deterministically.
* **Stable ordering** for explanation lists:
* Always sort factors by `(factorId, contributingObjectDigest)`.
### 2.2 Fixed-point scoring approach (recommended)
Represent weights and multipliers as **basis points** (bps):
* 100% = 10,000 bps
* 1% = 100 bps
Example: `totalScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
---
## 3) Extended score definition (v1)
### 3.1 Subscores (0100 integers)
#### BaseSeverity (B)
* Source: CVSS if present, else vendor severity, else default.
* Normalize to 0100:
* CVSS 0.010.0 0100 by `B = round(CVSS * 10)`
Keep it small weight so youre beyond CVSS but still anchored.
#### Reachability (R)
Computed from reachability report (call-path depth + gating conditions).
**Hop buckets** (example):
* 02 hops: 100
* 3 hops: 85
* 4 hops: 70
* 5 hops: 55
* 6 hops: 45
* 7 hops: 35
* 8+ hops: 20
* unreachable: 0
**Gate multipliers** (apply multiplicatively in bps):
* behind feature flag: ×7000
* auth required: ×8000
* only admin role: ×8500
* non-default config: ×7500
Final: `R = bucketScore * gateMultiplier / 10000`
#### Evidence (E)
Sum evidence points capped at 100, then apply freshness multiplier.
Evidence points (example):
* runtime trace hitting vulnerable symbol: +60
* DAST / integration test triggers behavior: +30
* SAST precise sink match: +20
* SCA presence only: +10
Freshness bucket multiplier (example):
* age 7 days: ×10000
* 30 days: ×9000
* 90 days: ×7500
* 180 days: ×6000
* 365 days: ×4000
* > 365: ×2000
Final: `E = min(100, sum(points)) * freshness / 10000`
#### Provenance (P)
Based on verified supply-chain checks.
Levels:
* unsigned/unknown: 0
* signed image: 30
* signed + SBOM hash-linked to image: 60
* signed + SBOM + DSSE attestations verified: 80
* above + reproducible build match: 100
### 3.2 Total score and overrides
Weights (example):
* `wB=1000` (10%)
* `wR=4500` (45%)
* `wE=3000` (30%)
* `wP=1500` (15%)
Total:
* `riskScore = (wB*B + wR*R + wE*E + wP*P) / 10000`
Override examples (still deterministic, because they depend on evidence flags):
* If `knownExploited=true` AND `R >= 70` → force score to 95+
* If unreachable (`R=0`) AND only SCA evidence (`E<=10`) → clamp score ≤ 25
---
## 4) Canonical schemas (what to build first)
### 4.1 ReachabilityReport (per artifact + vuln)
Minimum fields:
* `artifactDigest` (sha256 of image or build artifact)
* `graphDigest` (sha256 of canonical call-graph representation)
* `vulnId` (CVE/OSV/etc)
* `vulnerableSymbol` (fully-qualified)
* `entrypoints[]` (HTTP routes, queue consumers, CLI commands, cron handlers)
* `shortestPath`:
* `hops` (int)
* `nodes[]` (ordered list of symbols)
* `edges[]` (optional)
* `gates[]`:
* `type` (“featureFlag” | “authRequired” | “configNonDefault” | …)
* `detail` (string)
* `computedAt` (timestamp)
* `toolVersion`
### 4.2 EvidenceBundle (per artifact + vuln)
Evidence items are immutable and deduped by content hash.
* `evidenceId` (content hash)
* `artifactDigest`
* `vulnId`
* `type` (“SCA” | “SAST” | “DAST” | “RUNTIME” | “ADVISORY”)
* `tool` (name/version)
* `timestamp`
* `confidence` (0100)
* `subject` (package, symbol, endpoint)
* `payloadDigest` (hash of raw payload stored separately)
### 4.3 ProvenanceReport (per artifact)
* `artifactDigest`
* `signatureChecks[]` (who signed, what key, result)
* `sbomDigest` + `sbomType`
* `attestations[]` (DSSE digests + verification result)
* `transparencyLogRefs[]` (optional)
* `reproducibleMatch` (bool)
* `computedAt`
* `toolVersion`
* `verificationLogDigest`
### 4.4 ScoreInput + ScoreResult
**ScoreInput** should include:
* `asOf`
* `policyVersion`
* digests for reachability/evidence/provenance/base severity source
**ScoreResult** should include:
* `riskScore`, `subscores`
* `explain[]` (deterministic)
* `engineVersion`, `policyDigest`
* `inputs[]` (digests)
* `resultDigest` (hash of canonical ScoreResult)
* `signature` (Authority signs the digest)
---
## 5) Development implementation plan (phased, with deliverables + acceptance criteria)
### Phase A — Foundations: schemas, hashing, policy format, test harness
**Deliverables**
* Canonical JSON format rules + hashing utilities (shared lib)
* JSON Schemas for: ReachabilityReport, EvidenceItem, ProvenanceReport, ScoreInput, ScoreResult
* “Golden fixture” repo: a set of input bundles and expected ScoreResults
* Policy format `score.v1` (YAML or JSON) using **integer bps**
**Acceptance criteria**
* Same input bundle → identical `resultDigest` across:
* OS (Linux/Windows)
* CPU (x64/ARM64)
* runtime versions (supported .NET versions)
* Fixtures run in CI and fail on any byte-level diff
---
### Phase B — Scoring engine (pure function library)
**Deliverables**
* `Stella.ScoreEngine` as a pure library:
* `ComputeScore(ScoreInputBundle) -> ScoreResult`
* `Explain(ScoreResult) -> structured explanation` (already embedded)
* Policy parser + validator:
* weights sum to 10,000
* bucket tables monotonic
* override rules deterministic and total order
**Acceptance criteria**
* 100% deterministic tests passing (golden fixtures)
* “Explain” always includes:
* subscores
* applied buckets
* applied gate multipliers
* freshness bucket selected
* provenance level selected
* No non-deterministic dependencies (time, random, locale, float)
---
### Phase C — Evidence pipeline (Vexer / Evidence Store)
**Deliverables**
* Normalized evidence ingestion adapters:
* SCA ingest (from your existing scanner output)
* SAST ingest
* DAST ingest
* runtime trace ingest (optional MVP → “symbol hit” events)
* Evidence Store service:
* immutability (append-only)
* dedupe by `evidenceId`
* query by `(artifactDigest, vulnId)`
**Acceptance criteria**
* Ingesting the same evidence twice yields identical state (idempotent)
* Every evidence record can be exported as a bundle with content hashes
* Evidence timestamps preserved; `asOf` drives freshness deterministically
---
### Phase D — Reachability analyzer (Scanner extension)
**Deliverables**
* Call-graph builder and symbol resolver:
* for .NET: IL-level call graph + ASP.NET route discovery
* Reachability computation:
* compute shortest path hops from entrypoints to vulnerable symbol
* attach gating detections (config/feature/auth heuristics)
* Reachability report emitter:
* emits ReachabilityReport with stable digests
**Acceptance criteria**
* Given the same build artifact, reachability report digest is stable
* Paths are replayable and visualizable (nodes are resolvable)
* Unreachable findings are explicitly marked and explainable
---
### Phase E — Provenance verification (Authority / ProofGraph)
**Deliverables**
* Verification pipeline:
* signature verification for artifact digest
* SBOM hash linking
* attestation verification (DSSE/intoto style)
* optional transparency log reference capture
* optional reproducible-build comparison input
* ProvenanceReport emitter (signed verification log digest)
**Acceptance criteria**
* Verification is offline-capable if given the necessary bundles
* Any failed check is captured with a deterministic error code + message
* ProvenanceReport digest is stable for same inputs
---
### Phase F — Orchestration: “score a finding” workflow + VEX output
**Deliverables**
* Orchestrator service (or existing pipeline step) that:
1. receives a vulnerability finding
2. fetches reachability/evidence/provenance bundles
3. builds ScoreInput with `asOf`
4. computes ScoreResult
5. signs ScoreResult digest
6. emits VEX note referencing ScoreResult digest
* Storage for ScoreResult + VEX note (immutable, versioned)
**Acceptance criteria**
* “Recompute” produces same ScoreResult digest if inputs unchanged
* VEX note includes:
* policy version + digest
* engine version
* input digests
* score + subscores
* End-to-end API returns “why” data in <1 round trip (cached)
---
### Phase G — UI: “Why this score?” and replay/export
**Deliverables**
* Findings view enhancements:
* score badge + risk bucket (Low/Med/High/Critical)
* click-through Why this score
* Why this score panel:
* call path visualization (at least as an ordered list for MVP)
* evidence list with freshness + confidence
* provenance checks list (pass/fail)
* export bundle (inputs + policy + engine version) for audit replay
**Acceptance criteria**
* Any score is explainable in <30 seconds by a human reviewer
* Exported bundle can reproduce score offline
---
### Phase H — Governance: policy-as-code, versioning, calibration, rollout
**Deliverables**
* Policy registry:
* store `score.v1` policies by org/project/environment
* approvals + change log
* Versioning strategy:
* engine semantic versioning
* policy digest pinned in ScoreResult
* migration tooling (e.g., score.v1 score.v2)
* Rollout mechanics:
* shadow mode: compute score but dont enforce
* enforcement gates: block deploy if score threshold
**Acceptance criteria**
* Policy changes never rewrite past scores
* You can backfill new scores with a new policy version without ambiguity
* Audit log shows: who changed policy, when, why (optional but recommended)
---
## 6) Engineering backlog (epics → stories → DoD)
### Epic 1: Deterministic core
* Story: implement canonical JSON + hashing
* Story: implement fixed-point math helpers (bps)
* Story: implement score.v1 buckets + overrides
* DoD:
* no floats
* golden test suite
* deterministic explain ordering
### Epic 2: Evidence normalization
* Story: evidence schema + dedupe
* Story: adapters (SCA/SAST/DAST/runtime)
* Story: evidence query API
* DoD:
* idempotent ingest
* bundle export with digests
### Epic 3: Reachability
* Story: entrypoint discovery for target frameworks
* Story: call graph extraction
* Story: shortest-path computation
* Story: gating heuristics
* DoD:
* stable digests
* replayable paths
### Epic 4: Provenance
* Story: verify signatures
* Story: verify SBOM link
* Story: verify attestations
* Story: reproducible match input support
* DoD:
* deterministic error codes
* stable provenance scoring
### Epic 5: End-to-end score + VEX
* Story: orchestration
* Story: ScoreResult signing
* Story: VEX generation and storage
* DoD:
* recompute parity
* verifiable signatures
### Epic 6: UI
* Story: score badge + buckets
* Story: why panel
* Story: export bundle + recompute button
* DoD:
* human explainability
* offline replay works
---
## 7) APIs to implement (minimal but complete)
### 7.1 Compute score (internal)
* `POST /api/score/compute`
* input: `ScoreInput` + references or inline bundles
* output: `ScoreResult`
### 7.2 Get score (product)
* `GET /api/findings/{findingId}/score`
* returns latest ScoreResult + VEX reference
### 7.3 Explain score
* `GET /api/findings/{findingId}/score/explain`
* returns `explain[]` + call path + evidence list + provenance checks
### 7.4 Export replay bundle
* `GET /api/findings/{findingId}/score/bundle`
* returns a tar/zip containing:
* ScoreInput
* policy file
* reachability/evidence/provenance reports
* engine version manifest
---
## 8) Testing strategy (what to automate early)
### Unit tests
* bucket selection correctness
* gate multiplier composition
* evidence freshness bucketing
* provenance level mapping
* override rule ordering
### Golden fixtures
* fixed input bundles fixed ScoreResult digest
* run on every supported platform/runtime
### Property-based tests
* monotonicity:
* fewer hops should not reduce R
* more evidence points should not reduce E
* stronger provenance should not reduce P
### Integration tests
* full pipeline: finding bundles score VEX
* recompute parity tests
---
## 9) Operational concerns and hardening
### Performance
* Cache reachability per `(artifactDigest, vulnId, symbol)`
* Cache provenance per `artifactDigest`
* Evidence queries should be indexed by `(artifactDigest, vulnId, type)`
### Security
* Treat evidence ingestion as untrusted input:
* strict schema validation
* content-hash dedupe prevents tampering via overwrite
* Sign ScoreResults and VEX notes
* RBAC:
* who can change policy
* who can override scores (if allowed at all)
### Data retention
* Evidence payloads can be large; keep digests + store raw payloads in object storage
* Keep a minimal replay bundle always (schemas + digests + policy + engine)
---
## 10) Concrete “MVP first” slice (smallest valuable product)
If you want a crisp MVP that still satisfies auditable determinism:
1. Scoring engine (`B + R + E + P`), fixed-point, golden tests
2. Evidence store (SCA + runtime optional)
3. Reachability: only hop depth from HTTP routes to symbol (no fancy gates)
4. Provenance: signed image + SBOM link only
5. UI: score + why panel showing:
* hops/path list
* evidence list
* provenance checklist
6. Emit a signed VEX note containing the score + input digests
That MVP already proves the core differentiator: **deterministic, replayable risk scoring**.
---
## 11) Starter policy file (score.v1) using basis points
Heres a good real implementation starting point (int-only):
```yaml
policyVersion: score.v1
weightsBps:
baseSeverity: 1000
reachability: 4500
evidence: 3000
provenance: 1500
reachability:
hopBuckets:
- { maxHops: 2, score: 100 }
- { maxHops: 3, score: 85 }
- { maxHops: 4, score: 70 }
- { maxHops: 5, score: 55 }
- { maxHops: 6, score: 45 }
- { maxHops: 7, score: 35 }
- { maxHops: 9999, score: 20 }
unreachableScore: 0
gateMultipliersBps:
featureFlag: 7000
authRequired: 8000
adminOnly: 8500
nonDefaultConfig: 7500
evidence:
points:
runtime: 60
dast: 30
sast: 20
sca: 10
freshnessBuckets:
- { maxAgeDays: 7, multiplierBps: 10000 }
- { maxAgeDays: 30, multiplierBps: 9000 }
- { maxAgeDays: 90, multiplierBps: 7500 }
- { maxAgeDays: 180, multiplierBps: 6000 }
- { maxAgeDays: 365, multiplierBps: 4000 }
- { maxAgeDays: 99999, multiplierBps: 2000 }
provenance:
levels:
unsigned: 0
signed: 30
signedWithSbom: 60
signedWithSbomAndAttestations: 80
reproducible: 100
overrides:
- name: knownExploitedAndReachable
when:
flags:
knownExploited: true
minReachability: 70
setScore: 95
- name: unreachableAndOnlySca
when:
maxReachability: 0
maxEvidence: 10
clampMaxScore: 25
```
---
If you want, I can also include a **repo layout + CI “golden fixture” test runner** (dotnet test + cross-platform determinism checks) and a **.NET 10 ScoreEngine skeleton** that enforces: no floats, canonical JSON hashing, and stable explanation ordering.

View File

@@ -0,0 +1,744 @@
Heres a simple, highleverage UX metric to add to your pipeline run view that will immediately make DevOps feel faster and calmer:
# TimetoFirstSignal (TTFS)
**What it is:** the time from opening a runs details page until the UI renders the **first actionable insight** (e.g., “Stage `build` failed `dotnet restore` 401 token expired”).
**Why it matters:** engineers dont need *all* data instantly—just the first trustworthy clue to start acting. Lower TTFS = quicker triage, lower stress, tighter MTTR.
---
## What counts as a “first signal”
* Failed stage + reason (exit code, key log line, failing test name)
* Degraded but actionable status (e.g., flaky test signature)
* Policy gate block with the specific rule that failed
* Reachabilityaware security finding that blocks deploy (one concrete example, not the whole list)
> Not a signal: spinners, generic “loading…”, or unactionable counts.
---
## How to optimize TTFS (practical steps)
1. **Deferred loading (prioritize critical panes):**
* Render header + failing stage card first; lazyload artifacts, full logs, and graphs after.
* Preexpand the *first failing node* in the stage graph.
2. **Log preindexing at ingest:**
* During CI, stream logs into chunks keyed by `[jobId, phase, severity, firstErrorLine]`.
* Extract the **first error tuple** (timestamp, step, message) and store it next to the job record.
* On UI open, fetch only that tuple (sub100ms) before fetching the rest.
3. **Cached summaries:**
* Persist a tiny JSON “run.summary.v1” (status, first failing stage, first error line, blocking policies) in Redis/Postgres.
* Invalidate on new job events; always serve this summary first.
4. **Edge prefetch:**
* When the runs table is visible, prefetch summaries for rows in viewport so details pages open “warm”.
5. **Compress + cap first log burst:**
* Send the first **510 error lines** (already extracted) immediately; stream the rest.
---
## Instrumentation (so you can prove it)
Emit these points as telemetry:
* `ttfs_start`: when the run details route is entered (or when tab becomes visible)
* `ttfs_signal_rendered`: when the first actionable card is in the DOM
* `ttfs_ms = signal_rendered - start`
* Dimensions: `pipeline_provider`, `repo`, `branch`, `run_type` (PR/main), `device`, `release`, `network_state`
**SLO:** *P50 ≤ 700ms, P95 ≤ 2.5s* (adjust to your infra).
**Dashboards to track:**
* TTFS distribution (P50/P90/P95) by release
* Correlate TTFS with bounce rate and “open → rerun” delay
* Error budget: % of views with TTFS > 3s
---
## Minimal backend contract (example)
```json
GET /api/runs/{runId}/first-signal
{
"runId": "123",
"firstSignal": {
"type": "stage_failed",
"stage": "build",
"step": "dotnet restore",
"message": "401 Unauthorized: token expired",
"at": "2025-12-11T09:22:31Z",
"artifact": { "kind": "log", "range": {"start": 1880, "end": 1896} }
},
"summaryEtag": "W/\"a1b2c3\""
}
```
---
## Frontend pattern (Angular 17, signalfirst)
* Fire `first-signal` request in route resolver.
* Render `FirstSignalCard` immediately.
* Lazyload stage graph, full logs, security panes.
* Fire `ttfs_signal_rendered` when `FirstSignalCard` enters viewport.
---
## CI adapter hints (GitLab/GitHub/Azure)
* Hook on job status webhooks to compute & store the first error tuple.
* For GitLab: scan `trace` stream for first `ERRO|FATAL|##[error]` match; store to DB table `ci_run_first_signal(run_id, stage, step, message, t)`.
---
## “Good TTFS” acceptance tests
* Run with early fail → first signal < 1s, shows exact command + exit code.
* Run with policy gate fail rule name + fix hint visible first.
* Offline/slow network cached summary still renders an actionable hint.
---
## Copy to put in your UX guidelines
> “Optimize **TimetoFirstSignal (TTFS)** above all. Users must see one trustworthy, actionable clue within 1 second on a warm path—even if the rest of the UI is still loading.”
If you want, I can sketch the exact DB schema for the preindexed log tuples and the Angular resolver + telemetry hooks next.
Below is an extended, endtoend implementation plan for **TimetoFirstSignal (TTFS)** that you can drop into your backlog. It includes architecture, data model, API contracts, frontend work, observability, QA, and rolloutstructured as epics/phases with definition of done and acceptance criteria.
---
# Scope extension
## What were building
A run details experience that renders **one actionable clue** fastbefore loading heavy UI like full logs, graphs, artifacts.
**First signal”** is a small payload derived from run/job events and the earliest meaningful error evidence (stage/step + key log line(s) + reason/classification).
## What were extending beyond the initial idea
1. **FirstSignal Quality** (not just speed)
* Classify error type (auth, dependency, compilation, test, infra, policy, timeout).
* Identify culprit step and a stable signature for dedupe and search.
2. **Progressive disclosure UX**
* Summary First signal card expanded context (stage graph, logs, artifacts).
3. **Provideragnostic ingestion**
* Adapters for GitLab/GitHub/Azure (or your CI provider).
4. **Caching + prefetch**
* Warm open from list/table, with ETags and stalewhilerevalidate.
5. **Observability & SLOs**
* TTFS metrics, dashboards, alerting, and quality metrics (false signals).
6. **Rollout safety**
* Feature flags, canary, A/B gating, and a guaranteed fallback path.
---
# Success criteria
## Primary metric
* **TTFS (ms)**: time from details page route enter first actionable signal rendered.
## Targets (example SLOs)
* **P50 700 ms**, **P95 ≤ 2500 ms** on warm path.
* **Cold path**: P95 4000 ms (depends on infra).
## Secondary outcome metrics
* **OpenAction time**: time from opening run to first user action (rerun, cancel, assign, open failing log line).
* **Bounce rate**: close page within 10 seconds without interaction.
* **MTTR proxy**: time from failure to first rerun or fix commit.
## Quality metrics
* **Signal availability rate**: % of run views that show a first signal card within 3s.
* **Signal accuracy score** (sampled): engineer confirms helpful vs not”.
* **Extractor failure rate**: parsing errors / missing mappings / timeouts.
---
# Architecture overview
## Data flow
1. **CI provider events** (job started, job finished, stage failed, log appended) land in your backend.
2. **Run summarizer** maintains:
* `run_summary` (small JSON)
* `first_signal` (small, actionable payload)
3. **UI opens run details**
* Immediately calls `GET /runs/{id}/first-signal` (or `/summary`).
* Renders FirstSignalCard as soon as payload arrives.
4. Background fetches:
* Stage graph, full logs, artifacts, security scans, trends.
## Key decision: where to compute first signal
* **Option A: at ingest time (recommended)**
Compute first signal when logs/events arrive, store it, serve it instantly.
* **Option B: on demand**
Compute when user opens run details (simpler initially, worse TTFS and load).
---
# Data model
## Tables (relational example)
### `ci_run`
* `run_id (pk)`
* `provider`
* `repo_id`
* `branch`
* `status`
* `created_at`, `updated_at`
### `ci_job`
* `job_id (pk)`
* `run_id (fk)`
* `stage_name`
* `job_name`
* `status`
* `started_at`, `finished_at`
### `ci_log_chunk`
* `chunk_id (pk)`
* `job_id (fk)`
* `seq` (monotonic)
* `byte_start`, `byte_end` (range into blob)
* `first_error_line_no` (nullable)
* `first_error_excerpt` (nullable, short)
* `severity_max` (info/warn/error)
### `ci_run_summary`
* `run_id (pk)`
* `version` (e.g., `1`)
* `etag` (hash)
* `summary_json` (small, 15 KB)
* `updated_at`
### `ci_first_signal`
* `run_id (pk)`
* `etag`
* `signal_json` (small, 0.52 KB)
* `quality_flags` (bitmask or json)
* `updated_at`
## Cache layer
* Redis keys:
* `run:{runId}:summary:v1`
* `run:{runId}:first-signal:v1`
* TTL: generous but safe (e.g., 24h) with writethrough on event updates.
---
# First signal definition
## `FirstSignal` object (recommended shape)
```json
{
"runId": "123",
"computedAt": "2025-12-12T09:22:31Z",
"status": "failed",
"firstSignal": {
"type": "stage_failed",
"classification": "dependency_auth",
"stage": "build",
"job": "build-linux-x64",
"step": "dotnet restore",
"message": "401 Unauthorized: token expired",
"signature": "dotnet-restore-401-unauthorized",
"log": {
"jobId": "job-789",
"lines": [
"error : Response status code does not indicate success: 401 (Unauthorized).",
"error : The token is expired."
],
"range": { "start": 1880, "end": 1896 }
},
"suggestedActions": [
{ "label": "Rotate token", "type": "doc", "target": "internal://docs/tokens" },
{ "label": "Rerun job", "type": "action", "target": "rerun-job:job-789" }
]
},
"etag": "W/\"a1b2c3\""
}
```
### Notes
* `signature` should be stable for grouping.
* `suggestedActions` is optional but hugely valuable (even 12 actions).
---
# APIs
## 1) First signal endpoint
**GET** `/api/runs/{runId}/first-signal`
Headers:
* `If-None-Match: W/"..."` supported
* Response includes `ETag` and `Cache-Control`
Responses:
* `200`: full first signal object
* `304`: not modified
* `404`: run not found
* `204`: run exists but signal not available yet (rare; should degrade gracefully)
## 2) Summary endpoint (optional but useful)
**GET** `/api/runs/{runId}/summary`
* Includes: status, first failing stage/job, timestamps, blocking policies, artifact counts.
## 3) SSE / WebSocket updates (nice-to-have)
**GET** `/api/runs/{runId}/events` (SSE)
* Push new signal or summary updates in near real-time while user is on the page.
---
# Frontend implementation plan (Angular 17)
## UX behavior
1. **Route enter**
* Start TTFS timer.
2. Render instantly:
* Title, status badge, pipeline metadata (run id, commit, branch).
* Skeleton for details area.
3. Fetch first signal:
* Render `FirstSignalCard` immediately when available.
* Fire telemetry event when card is **in DOM and visible**.
4. Lazy-load:
* Stage graph
* Full logs viewer
* Artifacts list
* Security findings
* Trends, flaky tests, etc.
## Angular structure
* `RunDetailsResolver` (or `resolveFn`) requests first signal.
* `RunDetailsComponent` uses signals to render quickly.
* `FirstSignalCardComponent` is standalone + minimal deps.
## Prefetch strategy from runs list view
* When the runs table is visible, prefetch summaries/first signals for items in viewport:
* Use `IntersectionObserver` to prefetch only visible rows.
* Store results in an in-memory cache (e.g., `Map<runId, FirstSignal>`).
* Respect ETag to avoid redundant payloads.
## Telemetry hooks
* `ttfs_start`: route activation + tab visible
* `ttfs_signal_rendered`: FirstSignalCard attached and visible
* Dimensions: provider, repo, branch, run_type, release_version, network_state
---
# Backend implementation plan
## Summarizer / First-signal service
A service or module that:
* subscribes to run/job events
* receives log chunks (or pointers)
* computes and stores:
* `run_summary`
* `first_signal`
* publishes updates (optional) to an event stream for SSE
### Concurrency rule
First signal should be set once per run unless a better signal appears:
* if current signal is missing set
* if current signal is generic and new one is specific replace
* otherwise keep (avoid churn)
---
# Extraction & classification logic
## Minimum viable extractor (Phase 1)
* Heuristics:
* first match among patterns: `FATAL`, `ERROR`, `##[error]`, `panic:`, `Unhandled exception`, `npm ERR!`, `BUILD FAILED`, etc.
* plus provider-specific fail markers
* Pull:
* stage/job/step context (from job metadata or step boundaries)
* 510 log lines around first error line
## Improved extractor (Phase 2+)
* Language/tool specific rules:
* dotnet, maven/gradle, npm/yarn/pnpm, python/pytest, go test, docker build, terraform, helm
* Add `classification` and `signature`:
* normalize common errors:
* auth expired/forbidden
* missing dependency / DNS / TLS
* compilation error
* test failure (include test name)
* infra capacity / agent lost
* policy gate failure
## Guardrails
* **Secret redaction**: before storing excerpts, run your existing redaction pipeline.
* **Payload cap**: cap message length and excerpt lines.
* **PII discipline**: avoid including arbitrary stack traces if they contain sensitive paths; include only key lines.
---
# Development plan by phases (epics)
Each phase below includes deliverables + acceptance criteria. You can treat each as a sprint/iteration.
---
## Phase 0 — Baseline and alignment
### Deliverables
* Baseline TTFS measurement (current behavior)
* Definition of actionable signal and priority rules
* Performance budget for run details view
### Tasks
* Add client-side telemetry for current page load steps:
* route enter, summary loaded, logs loaded, graph loaded
* Measure TTFS proxy today (likely time to status shown”)
* Identify top 20 failure modes in your CI (from historical logs)
### Acceptance criteria
* Dashboard shows baseline P50/P95 for current experience.
* First signal contract signed off with UI + backend teams.
---
## Phase 1 — Data model and storage
### Deliverables
* DB migrations for `ci_run_summary` and `ci_first_signal`
* Redis cache keys and invalidation strategy
* ADR: where summaries live and how they update
### Tasks
* Create tables and indices:
* index on `run_id`, `updated_at`, `provider`
* Add serializer/deserializer for `summary_json` and `signal_json`
* Implement ETag generation (hash of JSON payload)
### Acceptance criteria
* Can store and retrieve summary + first signal for a run in < 50ms (DB) and < 10ms (cache).
* ETag works end-to-end.
---
## Phase 2 — Ingestion and first signal computation
### Deliverables
* First-signal computation module
* Provider adapter integration points (webhook consumers)
* first error tuple extraction from logs
### Tasks
* On job log append:
* scan incrementally for first error markers
* store excerpt + line range + job/stage/step mapping
* On job finish/fail:
* finalize first signal with best known context
* Implement the better signal replaces generic rule
### Acceptance criteria
* For a known failing run, API returns first signal without reading full log blob.
* Computation does not exceed a small CPU budget per log chunk (guard with limits).
* Extraction failure rate < 1% for sampled runs (initial).
---
## Phase 3 — API endpoints and caching
### Deliverables
* `/runs/{id}/first-signal` endpoint
* Optional `/runs/{id}/summary`
* Cache-control + ETag support
* Access control checks consistent with existing run authorization
### Tasks
* Serve cached first signal first; fallback to DB
* If missing:
* return `204` (or a pending object) and allow UI fallback
* Add server-side metrics:
* endpoint latency, cache hit rate, payload size
### Acceptance criteria
* Endpoint P95 latency meets target (e.g., < 200ms internal).
* Cache hit rate is high for active runs (after prefetch).
---
## Phase 4 — Frontend progressive rendering
### Deliverables
* FirstSignalCard component
* Route resolver + local cache
* Prefetch on runs list view
* Telemetry for TTFS
### Tasks
* Render shell immediately
* Fetch and render first signal
* Lazy-load heavy panels using `@defer` / dynamic imports
* Implement open failing stage default behavior
### Acceptance criteria
* In throttled network test, first signal card appears significantly earlier than logs and graphs.
* `ttfs_signal_rendered` fires exactly once per view, with correct dimensions.
---
## Phase 5 — Observability, dashboards, and alerting
### Deliverables
* TTFS dashboards by:
* provider, repo, run type, release version
* Alerts:
* P95 regression threshold
* Quality dashboard:
* availability rate, extraction failures, generic signal rate
### Tasks
* Create event pipeline for telemetry into your analytics system
* Define SLO/error budget alerts
* Add tracing (OpenTelemetry) for endpoint and summarizer
### Acceptance criteria
* You can correlate TTFS with:
* bounce rate
* openaction time
* You can pinpoint whether regressions are backend, frontend, or providerspecific.
---
## Phase 6 — QA, performance testing, rollout
### Deliverables
* Automated tests
* Feature flag + gradual rollout
* A/B experiment (optional)
### Tasks
**Testing**
* Unit tests:
* extractor patterns
* classification rules
* Integration tests:
* simulated job logs with known outcomes
* E2E (Playwright/Cypress):
* verify first signal appears before logs
* verify fallback path works if endpoint fails
* Performance tests:
* cold cache vs warm cache
* throttled CPU/network profiles
**Rollout**
* Feature flag:
* enabled for internal users first
* ramp by repo or percentage
* Monitor key metrics during ramp:
* TTFS P95
* API error rate
* UI error rate
* cache miss spikes
### Acceptance criteria
* No increase in overall error rates.
* TTFS improves at least X% for a meaningful slice of users (define X from baseline).
* Fallback UX remains usable when signals are unavailable.
---
# Backlog examples (ready-to-create Jira tickets)
## Epic: Run summary and first signal storage
* Create `ci_first_signal` table
* Create `ci_run_summary` table
* Implement ETag hashing
* Implement Redis caching layer
* Add admin/debug endpoint (internal only) to inspect computed signals
## Epic: Log chunk extraction
* Implement incremental log scanning
* Store first error excerpt + range
* Map excerpt to job + step
* Add redaction pass to excerpts
## Epic: Run details progressive UI
* FirstSignalCard UI component
* Lazy-load logs viewer
* Default to opening failing stage
* Prefetch signals in runs list
## Epic: Telemetry and dashboards
* Add `ttfs_start` and `ttfs_signal_rendered`
* Add endpoint latency metrics
* Build dashboards + alerts
* Add sampling for signal helpfulness feedback
---
# Risk register and mitigations
## Risk: First signal is wrong/misleading
* Mitigation:
* track generic signal rate and corrected by user feedback
* classification confidence scoring
* always provide quick access to full logs as fallback
## Risk: Logs contain secrets
* Mitigation:
* redact excerpts before storing/serving
* cap excerpt lines and length
* keep raw logs behind existing permissions
## Risk: Increased ingest CPU cost
* Mitigation:
* incremental scanning with early stop after first error captured
* limit scanning per chunk
* sample/skip overly large logs until job completion
## Risk: Cache invalidation bugs
* Mitigation:
* ETag-based correctness
* versioned keys: `:v1`
* write-through cache updates from summarizer
---
# Definition of Done checklist
A phase is done when:
* TTFS measured with reliable client events
* FirstSignalCard renders from `/first-signal` endpoint
* ETag caching works
* Fallback path is solid (no blank screens)
* Dashboards exist and are actively watched during rollout
* Security review completed for log excerpts/redaction
* Load tests show no unacceptable backend regressions
---
# Optional enhancements after initial launch
1. **Next-step recommendations**
Add action suggestions and deep links (rotate token, open failing test, open doc).
2. **Flaky test / known issue detection**
Show this matches known flaky signature with last-seen frequency.
3. **Compare to last green”**
Summarize what changed since last successful run (commit diff, dependency bump).
4. **SSE live updates**
Update first signal as soon as failure occurs while user watches.
---
If you tell me your current backend stack (Node/Go/.NET), log storage (S3/Elastic/Loki), and which CI providers you support, I can translate this into a concrete set of modules/classes, exact schema migrations, and the Angular routing + signals code structure youd implement.

View File

@@ -0,0 +1,643 @@
Heres a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
# Replay Fidelity (what, why, how)
**What it is:** the share of historical scans that reproduce **bitforbit** when rerun using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
**Why you want it:** it exposes hidden nondeterminism (feed drift, timedependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
---
## The metric
* **Perscan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`.
* **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`.
* **Breakdown:** also track by scanner, language, image base, feed version, and environment.
---
## What must be captured in the scan manifest
* Exact source refs (image digest / repo SHA), container layers digests
* Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
* Feed snapshots (CVE DB, OVAL, vendor advisories) as **contentaddressed** bundles
* Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
* Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
---
## Pass/Fail rules you can ship
* **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
* **Warn:** Any bucket drops by 2% weekoverweek
* **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95
---
## Minimal replay harness (outline)
1. Pick N historical scans (e.g., last 200 or stratified by image language).
2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds).
3. Rerun in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs SHA256.
5. Emit: pass/fail, diff summary, and the cause tag if mismatch (feed, policy, runtime, code).
---
## Dashboard (what to show)
* Fidelity % (30/90day) + sparkline
* Top offenders (by language/scanner/policy set)
* Cause of mismatch histogram (feed vs runtime vs code vs policy)
* Clickthrough: deterministic diff (e.g., which CVEs flipped and why)
---
## Quick wins for StellaOps
* Treat **feeds as immutable snapshots** (contentaddressed tar.zst) and record their digest in each scan.
* Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network).
* Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
* Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays.
* Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`.
Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres?
Below is an extended Replay Fidelity design **plus a concrete development implementation plan** you can hand to engineering. Im assuming StellaOps is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability feeds and policy/lattice/rules.
---
## 1) Extend the concept: Replay Fidelity as a product capability
### 1.1 Fidelity levels (so you can be strict without being brittle)
Instead of a single yes/no, define **tiers** that you can report and gate on:
1. **Bitwise Fidelity (BF)**
* *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization.
* *Use:* strongest auditability, catch ordering/nondeterminism.
2. **Semantic Fidelity (SF)**
* *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps).
* *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
* *Use:* protects you from cosmetic diffs and helps triage.
3. **Policy Fidelity (PF)**
* *Definition:* Final policy decision (pass/fail + reason codes) matches.
* *Use:* useful when outputs may evolve but governance outcome must remain stable.
**Recommended reporting:**
* Dashboard shows BF, SF, PF together.
* Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally.
---
### 1.2 “Why did it drift?”—Mismatch classification taxonomy
When a replay fails, auto-tag the cause so humans dont diff JSON by hand.
**Primary mismatch classes**
* **Feed drift:** CVE/OVAL/vendor advisory snapshot differs.
* **Policy drift:** policy/lattice/rules differ (or default rule set changed).
* **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences.
* **Scanner drift:** scanner binary build differs or dependency versions changed.
* **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic.
* **External IO:** network calls, latest resolution, remote package registry changes.
**Output:** a `mismatch_reason` plus a short `diff_summary`.
---
### 1.3 Deterministic “scan envelope” design
A replay only works if the scan is fully specified.
**Scan envelope components**
* **Inputs:** image digest, repo commit SHA, build provenance, layers digests.
* **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles.
* **Feeds:** content-addressed feed bundle digests (see §2.3).
* **Policy/rules:** git commit SHA + content digest of compiled rules.
* **Environment:** OS/arch, tz/locale, clock mode”, network mode, CPU count.
* **Normalization:** canonicalization version for SBOM/VEX/findings.
---
### 1.4 Canonicalization so “bitwise” is meaningful
To make BF achievable:
* Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
* Strip/normalize volatile fields (timestamps, scan_duration_ms”, hostnames)
* Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)`
* Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
---
### 1.5 Sampling strategy
You dont need to replay everything.
**Nightly sample:** stratified by:
* language ecosystem (npm, pip, maven, go, rust…)
* scanner engine
* base OS
* regulatory tier
* image size/complexity
**Plus:** always replay golden canaries (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
---
## 2) Technical architecture blueprint
### 2.1 System components
1. **Manifest Writer (in the scan pipeline)**
* Produces `ScanManifest v1` JSON
* Records all digests and versions
2. **Artifact Store**
* Stores SBOM, findings, VEX, evidence blobs
* Stores canonical hashes for BF checks
3. **Feed Snapshotter**
* Periodically builds immutable feed bundles
* Content-addressed (digest-keyed)
* Stores metadata (source URLs, generation timestamp, signature)
4. **Replay Orchestrator**
* Chooses historical scans to replay
* Launches replay executor jobs
5. **Replay Executor**
* Runs scanner in pinned container image
* Network off, tz fixed, clock policy applied
* Produces new artifacts + hashes
6. **Diff & Scoring Engine**
* Computes BF/SF/PF
* Generates mismatch classification + diff summary
7. **Metrics + UI Dashboard**
* Prometheus metrics
* UI for drill-down diffs
---
### 2.2 Data model (Postgres-friendly)
**Core tables**
* `scan_manifests`
* `scan_id (pk)`
* `manifest_json`
* `manifest_sha256`
* `created_at`
* `scan_artifacts`
* `scan_id (fk)`
* `artifact_type` (sbom|findings|vex|evidence)
* `artifact_uri`
* `canonical_sha256`
* `schema_version`
* `feed_snapshots`
* `feed_digest (pk)`
* `bundle_uri`
* `sources_json`
* `generated_at`
* `signature`
* `replay_runs`
* `replay_id (pk)`
* `original_scan_id (fk)`
* `status` (queued|running|passed|failed)
* `bf_match bool`, `sf_match bool`, `pf_match bool`
* `mismatch_reason`
* `diff_summary_json`
* `started_at`, `finished_at`
* `executor_env_json` (arch, tz, cpu, image digest)
**Indexes**
* `(created_at)` for sampling windows
* `(mismatch_reason, finished_at)` for triage
* `(scanner_version, ecosystem)` for breakdown dashboards
---
### 2.3 Feed Snapshotting (the key to long-term replay)
**Feed bundle format**
* `feeds/<source>/<date>/...` inside a tar.zst
* manifest file inside bundle: `feed_bundle_manifest.json` containing:
* source URLs
* retrieval commit/etag (if any)
* file hashes
* generated_by version
**Content addressing**
* Digest of the entire bundle (`sha256(tar.zst)`) is the reference.
* Scans record only the digest + URI.
**Immutability**
* Store bundles in object storage with WORM / retention if you need compliance.
---
### 2.4 Replay execution sandbox
For determinism, enforce:
* **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags)
* **Fixed TZ/locale**
* **Pinned container image digest**
* **Clock policy**
* Either real time but recorded or frozen time at original scan timestamp
* If scanner logic uses current date for severity windows, freeze time
---
## 3) Development implementation plan
Ill lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size.
### Workstream A — Scan Manifest & Canonical Artifacts
**Goal:** every scan is replayable on paper, even before replays run.
**Deliverables**
* `ScanManifest v1` schema + writer integrated into scan pipeline
* Canonicalization library + canonical hashing for all artifacts
**Acceptance criteria**
* Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
* Artifact hashes are stable across repeated runs in the same environment
---
### Workstream B — Feed Snapshotting & Policy Versioning
**Goal:** eliminate feed drift by pinning immutable inputs.
**Deliverables**
* Feed bundle builder + signer + uploader
* Policy/rules bundler (compiled rules bundle, digest recorded)
**Acceptance criteria**
* New scans reference feed bundle digests (not latest”)
* A scan can be re-run with the same feed bundle and policy bundle
---
### Workstream C — Replay Runner & Diff Engine
**Goal:** execute historical scans and score BF/SF/PF with actionable diffs.
**Deliverables**
* `stella replay --from manifest.json`
* Orchestrator job to schedule replays
* Diff engine + mismatch classifier
* Storage of replay results
**Acceptance criteria**
* Replay produces deterministic artifacts in a pinned environment
* Dashboard/CLI shows BF/SF/PF + diff summary for failures
---
### Workstream D — Observability, Dashboard, and CI Gates
**Goal:** make fidelity visible and enforceable.
**Deliverables**
* Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf`
* Breakdown labels (scanner, ecosystem, policy_set, base_os)
* Alerts for drop thresholds
* CI gate option: block release if BF < threshold on canary set
**Acceptance criteria**
* Engineering can see drift within 24h
* Releases are blocked when fidelity regressions occur
---
## 4) Suggested sprint plan with concrete tasks
### Sprint 0 — Design lock + baseline
**Tasks**
* Define manifest schema: `ScanManifest v1` fields + versioning rules
* Decide canonicalization rules (what is normalized vs preserved)
* Choose initial golden canary scan set (1020 representative targets)
* Add replay-fidelity epic with ownership & SLIs/SLOs
**Exit criteria**
* Approved schema + canonicalization spec
* Canary set stored and tagged
---
### Sprint 1 — Manifest writer + artifact hashing (MVP)
**Tasks**
* Implement manifest writer in scan pipeline
* Store `manifest_json` + `manifest_sha256`
* Implement canonicalization + hashing for:
* findings list (sorted)
* SBOM (normalized)
* VEX (if present)
* Persist canonical hashes in `scan_artifacts`
**Exit criteria**
* Two identical scans in the same environment yield identical artifact hashes
* A manifest export endpoint/CLI works:
* `stella scan --emit-manifest out.json`
---
### Sprint 2 — Feed snapshotter + policy bundling
**Tasks**
* Build feed bundler job:
* pull raw sources
* normalize layout
* generate `feed_bundle_manifest.json`
* tar.zst + sha256
* upload + record in `feed_snapshots`
* Update scan pipeline:
* resolve feed bundle digest at scan start
* record digest in scan manifest
* Bundle policy/lattice:
* compile rules into an immutable artifact
* record policy bundle digest in manifest
**Exit criteria**
* Scans reference immutable feed + policy digests
* You can fetch feed bundle by digest and reproduce the same feed inputs
---
### Sprint 3 — Replay executor + “no network” sandbox
**Tasks**
* Create replay container image / runtime wrapper
* Implement `stella replay --from MANIFEST.json`
* pulls scanner image by digest
* mounts feed bundle + policy bundle
* runs in network-off mode
* applies tz/locale + clock mode
* Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage)
**Exit criteria**
* Replay runs end-to-end for canary scans
* Deterministic runtime controls verified (no DNS egress, fixed tz)
---
### Sprint 4 — Diff engine + mismatch classification
**Tasks**
* Implement BF compare (canonical hashes)
* Implement SF compare (semantic JSON/object comparison)
* Implement PF compare (policy decision equivalence)
* Implement mismatch classification rules:
* if feed digest differs feed drift
* if scanner digest differs scanner drift
* if environment differs runtime drift
* else nondeterminism (with sub-tags for ordering/time/RNG)
* Generate `diff_summary_json`:
* top N changed CVEs
* packages added/removed
* policy verdict changes
**Exit criteria**
* Every failed replay has a cause tag and a diff summary thats useful in <2 minutes
* Engineers can reproduce failures locally with the manifest
---
### Sprint 5 — Dashboard + alerts + CI gate
**Tasks**
* Expose Prometheus metrics from replay service
* Build dashboard:
* BF/SF/PF trends
* breakdown by ecosystem/scanner/policy
* mismatch cause histogram
* Add alerting rules (drop threshold, bucket regression)
* Add CI gate mode:
* run replays on canary set for this release candidate
* block merge if BF < target
**Exit criteria**
* Fidelity visible to leadership and engineering
* Release process is protected by canary replays
---
### Sprint 6 — Hardening + compliance polish
**Tasks**
* Backward compatible manifest upgrades:
* `manifest_version` bump rules
* migration support
* Artifact signing / integrity:
* sign manifest hash
* optional transparency log later
* Storage & retention policies (cost controls)
* Runbook + oncall playbook
**Exit criteria**
* Audit story is complete: show me exactly how scan X was produced
* Operational load is manageable and cost-bounded
---
## 5) Engineering specs you can start implementing immediately
### 5.1 `ScanManifest v1` skeleton (example)
```json
{
"manifest_version": "1.0",
"scan_id": "scan_123",
"created_at": "2025-12-12T10:15:30Z",
"input": {
"type": "oci_image",
"image_ref": "registry/app@sha256:...",
"layers": ["sha256:...", "sha256:..."],
"source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
},
"scanner": {
"engine": "stella",
"scanner_image_digest": "sha256:...",
"scanner_version": "2025.12.0",
"config_digest": "sha256:...",
"flags": ["--deep", "--vex"]
},
"feeds": {
"vuln_feed_bundle_digest": "sha256:...",
"license_db_digest": "sha256:..."
},
"policy": {
"policy_bundle_digest": "sha256:...",
"policy_set": "prod-default"
},
"environment": {
"arch": "amd64",
"os": "linux",
"tz": "UTC",
"locale": "C",
"network": "disabled",
"clock_mode": "frozen",
"clock_value": "2025-12-12T10:15:30Z"
},
"normalization": {
"canonicalizer_version": "1.2.0",
"sbom_schema": "cyclonedx-1.6",
"vex_schema": "cyclonedx-vex-1.0"
}
}
```
---
### 5.2 CLI spec (minimal)
* `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/`
* `stella replay --from MANIFEST.json --out-dir replay_out/`
* `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json`
---
## 6) Testing strategy (to prevent determinism regressions)
### Unit tests
* Canonicalization: same object same bytes
* Sorting stability: randomized input order stable output
* Hash determinism
### Integration tests
* Golden canaries:
* run scan twice in same runner BF match
* replay from manifest BF match
* Network leak test:
* DNS requests must be zero
* Clock leak test:
* freeze time; ensure outputs do not include real timestamps
### Chaos tests
* Vary CPU count, run concurrency, run order still BF match
* Randomized scheduling / thread interleavings to find races
---
## 7) Operational policies (so it stays useful)
### Retention & cost controls
* Keep full artifacts for regulated scans (e.g., 17 years)
* For non-regulated:
* keep manifests + canonical hashes long-term
* expire heavy evidence blobs after N days
* Compress large artifacts and dedupe by digest
### Alerting examples
* BF drops by 2% week-over-week (any major bucket) warn
* BF < 0.90 overall or regulated BF < 0.95 page / block release
### Triage workflow
* Failed replay auto-creates a ticket with:
* manifest link
* mismatch_reason
* diff_summary
* reproduction command
---
## 8) What “done” looks like (definition of success)
* Any customer/auditor can pick a scan from 6 months ago and you can:
1. retrieve manifest + feed bundle + policy bundle by digest
2. replay in a pinned sandbox
3. show BF/SF/PF results and diffs
* Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
---
If you want, I can also provide:
* a **Postgres DDL** for the tables above,
* a **Prometheus metrics contract** (names + labels + example queries),
* and a **diff_summary_json schema** that supports a UI diff view without reprocessing artifacts.

View File

@@ -0,0 +1,840 @@
Heres a quick, plainEnglish idea you can use right away: **not all code diffs are equal**—some actually change whats *reachable* at runtime (and thus security posture), while others just refactor internals. A “**SmartDiff**” pipeline flags only the diffs that open or close attack paths by combining (1) callstack traces, (2) dependency graphs, and (3) dataflow.
---
### Why this matters (background)
* Text diffs ≠ behavior diffs. A rename or refactor can look big in Git but do nothing to reachable flows from external entry points (HTTP, gRPC, CLI, message consumers).
* Security triage gets noisy because scanners attach CVEs to all present packages, not to the code paths you can actually hit.
* **Dataflowaware diffs** shrink noise and make VEX generation honest: “vuln present but **not exploitable** because the sink is unreachable from any policydefined entrypoint.”
---
### Minimal architecture (fits StellaOps)
1. **Entrypoint map** (per service): controllers, handlers, consumers.
2. **Call graph + dataflow** (per commit): Roslyn for C#, `golang.org/x/tools/go/callgraph` for Go, plus taint rules (source→sink).
3. **Reachability cache** keyed by (commit, entrypoint, package@version).
4. **SmartDiff** = `reachable_paths(commit_B) reachable_paths(commit_A)`.
* If a path to a sensitive sink is newly reachable → **High**.
* If a path disappears → autogenerate **VEX “not affected (no reachable path)”**.
---
### Tiny working seeds
**C# (.NET 10) — Roslyn skeleton to diff callreachability**
```csharp
// SmartDiff.csproj targets net10.0
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.FindSymbols;
public static class SmartDiff
{
public static async Task<HashSet<string>> ReachableSinks(string solutionPath, string[] entrypoints, string[] sinks)
{
var workspace = MSBuild.MSBuildWorkspace.Create();
var solution = await workspace.OpenSolutionAsync(solutionPath);
var index = new HashSet<string>();
foreach (var proj in solution.Projects)
{
var comp = await proj.GetCompilationAsync();
if (comp is null) continue;
// Resolve entrypoints & sinks by symbol name
var epSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
.OfType<IMethodSymbol>().Where(m => entrypoints.Contains(m.ToDisplayString())).ToList();
var sinkSymbols = comp.GlobalNamespace.GetMembers().SelectMany(Descend)
.OfType<IMethodSymbol>().Where(m => sinks.Contains(m.ToDisplayString())).ToList();
foreach (var ep in epSymbols)
foreach (var sink in sinkSymbols)
{
// Heuristic reachability: cheap path search via SymbolFinder
var refs = await SymbolFinder.FindReferencesAsync(sink, solution);
if (refs.SelectMany(r => r.Locations).Any()) // replace with real graph walk
index.Add($"{ep.ToDisplayString()} -> {sink.ToDisplayString()}");
}
}
return index;
static IEnumerable<ISymbol> Descend(INamespaceOrTypeSymbol sym)
{
foreach (var m in sym.GetMembers())
{
yield return m;
if (m is INamespaceOrTypeSymbol nt) foreach (var x in Descend(nt)) yield return x;
}
}
}
}
```
**Go — SSA & callgraph seed**
```go
// go.mod: require golang.org/x/tools latest
package main
import (
"fmt"
"golang.org/x/tools/go/callgraph/cha"
"golang.org/x/tools/go/packages"
"golang.org/x/tools/go/ssa"
)
func main() {
cfg := &packages.Config{Mode: packages.LoadAllSyntax, Tests: false}
pkgs, _ := packages.Load(cfg, "./...")
prog, pkgsSSA := ssa.NewProgram(pkgs[0].Fset, ssa.BuilderMode(0))
for _, p := range pkgsSSA { prog.CreatePackage(p, p.Syntax, p.TypesInfo, true) }
prog.Build()
cg := cha.CallGraph(prog)
// TODO: map entrypoints & sinks, then walk cg from EPs to sinks
fmt.Println("nodes:", len(cg.Nodes))
}
```
---
### How to use it in your pipeline (fast win)
* **Premerge job**:
1. Build call graph for `HEAD` and `HEAD^`.
2. Compute SmartDiff.
3. If any *new* EP→sink path appears, fail with a short, prooflinked note:
“New reachable path: `POST /Invoices -> PdfExporter.Save(string path)` (writes outside sandbox).”
* **Postscan VEX**:
* For each CVE on a package, mark **Affected** only if any EP can reach a symbol that uses that packages vulnerable surface.
---
### Evidence to show in the UI
* “**Path card**”: EP → … → Sink, with file:line hoplist and commit hash.
* “**What changed**”: before/after path diff (green removed, red added).
* “**Why it matters**”: sink classification (network write, file write, deserialization, SQL, crypto).
---
### Developer checklist (StellaOps style)
* [ ] Define entrypoints per service (attribute or YAML).
* [ ] Define sink taxonomy (FS, NET, DESER, SQL, CRYPTO).
* [ ] Implement language adapters: `.NET (Roslyn)`, `Go (SSA)`, later `Java (Soot/WALA)`.
* [ ] Add a **ReachabilityCache** (Postgres table keyed by commit+lang+service).
* [ ] Wire a `SmartDiffJob` in CI; emit SARIF + CycloneDX `vulnerability-assertions` extension or OpenVEX.
* [ ] Gate merges on **newlyreachable sensitive sinks**; autoVEX when paths disappear.
If you want, I can turn this into a small repo scaffold (Roslyn + Go adapters, Postgres schema, a GitLab/GitHub pipeline, and a minimal UI “path card”).
Below is a concrete **development implementation plan** to take the “SmartDiff” idea (reachability + dataflow + dependency/vuln context) into a shippable product integrated into your pipeline (Stella Ops style). Ill assume the initial languages are **.NET (C#)** and **Go**, and the initial goal is **PR gating + VEX automation** with strong evidence (paths + file/line hops).
---
## 1) Product definition
### Problem youre solving
Security noise comes from:
* “Vuln exists in dependency” ≠ “vuln exploitable from any entrypoint”
* Git diffs look big even when behavior is unchanged
* Teams struggle to triage “is this change actually risky?”
### What SmartDiff should do (core behavior)
Given **base commit A** and **head commit B**:
1. Identify **entrypoints** (web handlers, RPC methods, message consumers, CLI commands).
2. Identify **sinks** (file write, command exec, SQL, SSRF, deserialization, crypto misuse, templating, etc.).
3. Compute **reachable paths** from entrypoints → sinks (call graph + dataflow/taint).
4. Emit **SmartDiff**:
* **Newly reachable** EP→sink paths (risk ↑)
* **Removed** EP→sink paths (risk ↓)
* **Changed** paths (same sink but different sanitization/guards)
5. Attach **dependency vulnerability context**:
* If a vulnerable API surface is reachable (or data reaches it), mark “affected/exploitable”
* Otherwise generate **VEX**: “not affected” / “not exploitable” with evidence
### MVP definition (minimum shippable)
A PR check that:
* Flags **new** reachable paths to a small set of highrisk sinks (e.g., command exec, unsafe deserialization, filesystem write, SSRF/network dial, raw SQL).
* Produces:
* SARIF report (for code scanning UI)
* JSON artifact containing proof paths (EP → … → sink with file:line)
* Optional VEX statement for dependency vulnerabilities (if you already have an SCA feed)
---
## 2) Architecture you can actually build
### Highlevel components
1. **Policy & Taxonomy Service**
* Defines entrypoints, sources, sinks, sanitizers, confidence rules
* Versioned and centrally managed (but supports repo overrides)
2. **Analyzer Workers (language adapters)**
* .NET analyzer (Roslyn + control flow)
* Go analyzer (SSA + callgraph)
* Outputs standardized IR (Intermediate Representation)
3. **Graph Store + Reachability Engine**
* Stores symbol nodes + call edges + dataflow edges
* Computes reachable sinks per entrypoint
* Computes diff between commits A and B
4. **Vulnerability Mapper + VEX Generator**
* Maps vulnerable packages/functions → “surfaces”
* Joins with reachability results
* Emits OpenVEX (or CycloneDX VEX) with evidence links
5. **CI/PR Integrations**
* CLI that runs in CI
* Optional server mode (cache + incremental processing)
6. **UI/API**
* Path cards: “what changed”, “why it matters”, “proof”
* Filters by sink class, confidence, service, entrypoint
### Data contracts (standardized IR)
Make every analyzer output the same shapes so the rest of the pipeline is languageagnostic:
* **Symbols**
* `symbol_id`: stable hash of (lang, module, fully-qualified name, signature)
* metadata: file, line ranges, kind (method/function), accessibility
* **Edges**
* Call edge: `caller_symbol_id -> callee_symbol_id`
* Dataflow edge: `source_symbol_id -> sink_symbol_id` with variable/parameter traces
* Edge metadata: type, confidence, reason (static, reflection guess, interface dispatch, etc.)
* **Entrypoints / Sources / Sinks**
* entrypoint: (symbol_id, route/topic/command metadata)
* sink: (symbol_id, sink_type, severity, cwe mapping optional)
* **Paths**
* `entrypoint -> ... -> sink`
* hop list: symbol_id + file:line, plus “dataflow step evidence” when relevant
---
## 3) Workstreams and deliverables
### Workstream A — Policy, taxonomy, configuration
**Deliverables**
* `smartdiff.policy.yaml` schema and validator
* A default sink taxonomy:
* `CMD_EXEC`, `UNSAFE_DESER`, `SQL_RAW`, `SSRF`, `FILE_WRITE`, `PATH_TRAVERSAL`, `TEMPLATE_INJECTION`, `CRYPTO_WEAK`, `AUTHZ_BYPASS` (expand later)
* Initial sanitizer patterns:
* For example: parameter validation, safe deserialization wrappers, ORM parameterized APIs, path normalization, allowlists
**Implementation notes**
* Start strict and small: 1020 sinks, 10 sources, 10 sanitizers.
* Provide repo-level overrides:
* `smartdiff.policy.yaml` in repo root
* Central policies referenced by version tag
**Acceptance criteria**
* A service can onboard by configuring:
* entrypoint discovery mode (auto + manual)
* sink classes to enforce
* severity threshold to fail PR
---
### Workstream B — .NET analyzer (Roslyn)
**Deliverables**
* Build pipeline that produces:
* call graph (methods and invocations)
* basic control-flow guards for reachability (optional for MVP)
* taint propagation for common patterns (MVP: parameter → sink)
* Entry point discovery for:
* ASP.NET controllers (`[HttpGet]`, `[HttpPost]`)
* Minimal APIs (`MapGet/MapPost`)
* gRPC service methods
* message consumers (configurable attributes/interfaces)
**Implementation notes (practical path)**
* MVP static callgraph:
* Use Roslyn semantic model to resolve invocation targets
* For virtual/interface calls: conservative resolution to possible implementations within the compilation
* MVP taint:
* “Sources”: request params/body, headers, query string, message payloads
* “Sinks”: wrappers around `Process.Start`, `SqlCommand`, `File.WriteAllText`, `HttpClient.Send`, deserializers, etc.
* Propagate taint across:
* parameter → local → argument
* return values
* simple assignments and concatenations (heuristic)
* Confidence scoring:
* Direct static call resolution: high
* Reflection/dynamic: low (flag separately)
**Acceptance criteria**
* On a demo ASP.NET service, if a PR adds:
* `HttpPost /upload``File.WriteAllBytes(userPath, ...)`
SmartDiff flags **new EP→FILE_WRITE path** and shows hops with file/line.
---
### Workstream C — Go analyzer (SSA)
**Deliverables**
* SSA build + callgraph extraction
* Entrypoint discovery for:
* `net/http` handlers
* common routers (Gin/Echo/Chi) via adapter rules
* gRPC methods
* consumers (Kafka/NATS/etc.) by config
**Implementation notes**
* Use `golang.org/x/tools/go/packages` + `ssa` build
* Callgraph:
* start with CHA (Class Hierarchy Analysis) for speed
* later add pointer analysis for precision on interfaces
* Taint:
* sources: `http.Request`, router params, message payloads
* sinks: `os/exec`, `database/sql` raw query, file I/O, `net/http` outbound, unsafe deserialization libs
**Acceptance criteria**
* A PR that adds `exec.Command(req.FormValue("cmd"))` becomes a **new EP→CMD_EXEC** finding.
---
### Workstream D — Graph store + reachability computation
**Deliverables**
* Schema in Postgres (recommended first) for:
* commits, services, languages
* symbols, edges, entrypoints, sinks
* computed reachable “facts” (entrypoint→sink with shortest path(s))
* Reachability engine:
* BFS/DFS per entrypoint with early cutoffs
* path reconstruction storage (store predecessor map or store k-shortest paths)
**Implementation notes**
* Dont start with a graph DB unless you must.
* Use Postgres tables + indexes:
* `edges(from_symbol, to_symbol, commit_id, kind)`
* `symbols(symbol_id, lang, module, fqn, file, line_start, line_end)`
* `reachability(entrypoint_id, sink_id, commit_id, path_hash, confidence, severity, evidence_json)`
* Cache:
* keyed by (commit, policy_version, analyzer_version)
* avoids recompute on re-runs
**Acceptance criteria**
* For any analyzed commit, you can answer:
* “Which sinks are reachable from these entrypoints?”
* “Show me one proof path per (entrypoint, sink_type).”
---
### Workstream E — SmartDiff engine (the “diff” part)
**Deliverables**
* Diff algorithm producing three buckets:
* `added_paths`, `removed_paths`, `changed_paths`
* “Changed” means:
* same entrypoint + sink type, but path differs OR taint/sanitization differs OR confidence changes
**Implementation notes**
* Identify a path by a stable fingerprint:
* `path_id = hash(entrypoint_symbol + sink_symbol + sink_type + policy_version + analyzer_version)`
* Store:
* top-k paths for each pair for evidence (k=1 for MVP, add more later)
* Severity gating rules:
* Example:
* New path to `CMD_EXEC` = fail
* New path to `FILE_WRITE` = warn unless under `/tmp` allowlist
* New path to `SQL_RAW` = fail unless parameterized sanitizer present
**Acceptance criteria**
* Given commits A and B:
* If B introduces a new reachable sink, CI fails with a single actionable card:
* **EP**: route / handler
* **Sink**: type + symbol
* **Proof**: hop list
* **Why**: policy rule triggered
---
### Workstream F — Vulnerability mapping + VEX
**Deliverables**
* Ingest dependency inventory (SBOM or lockfiles)
* Map vulnerabilities to “surfaces”
* package → vulnerable module/function patterns
* minimal version/range matching (from your existing vuln feed)
* Decision logic:
* **Affected** if any reachable path intersects vulnerable surface OR dataflow reaches vulnerable sink
* else **Not affected / Not exploitable** with justification
**Implementation notes**
* Start with a pragmatic approach:
* packagelevel reachability: “is any symbol in that package reachable?”
* then iterate toward functionlevel surfaces
* VEX output:
* include commit hash, policy version, evidence paths
* embed links to internal “path card” URLs if available
**Acceptance criteria**
* For a known vulnerable dependency, the system emits:
* VEX “not affected” if package code is never reached from any entrypoint, with proof references.
---
### Workstream G — CI integration + developer UX
**Deliverables**
* A single CLI:
* `smartdiff analyze --commit <sha> --service <svc> --lang <dotnet|go>`
* `smartdiff diff --base <shaA> --head <shaB> --out sarif`
* CI templates for:
* GitHub Actions / GitLab CI
* Outputs:
* SARIF
* JSON evidence bundle
* optional OpenVEX file
**Acceptance criteria**
* Teams can enable SmartDiff by adding:
* CI job + config file
* no additional infra required for MVP (local artifacts mode)
* When infra is available, enable server caching mode for speed.
---
### Workstream H — UI “Path Cards”
**Deliverables**
* UI components:
* Path card list with filters (sink type, severity, confidence)
* “What changed” diff view:
* red = added hops
* green = removed hops
* “Evidence” panel:
* file:line for each hop
* code snippets (optional)
* APIs:
* `GET /smartdiff/{repo}/{pr}/findings`
* `GET /smartdiff/{repo}/{commit}/path/{path_id}`
**Acceptance criteria**
* A developer can click one finding and understand:
* how the data got there
* exactly what line introduced the risk
* how to fix (sanitize/guard/allowlist)
---
## 4) Milestone plan (sequenced, no time promises)
### Milestone 0 — Foundation
* Repo scaffolding:
* `smartdiff-cli/`
* `analyzers/dotnet/`
* `analyzers/go/`
* `core-ir/` (schemas + validation)
* `server/` (optional; can come later)
* Define IR JSON schema + versioning rules
* Implement policy YAML + validator + sample policies
* Implement “local mode” artifact output
**Exit criteria**
* You can run `smartdiff analyze` and get a valid IR file for at least one trivial repo.
---
### Milestone 1 — Callgraph reachability MVP
* .NET: build call edges + entrypoint discovery (basic)
* Go: build call edges + entrypoint discovery (basic)
* Graph store: in-memory or local sqlite/postgres
* Compute reachable sinks (callgraph only, no taint)
**Exit criteria**
* On a demo repo, you can list:
* entrypoints
* reachable sinks (callgraph reachability only)
* a proof path (hop list)
---
### Milestone 2 — SmartDiff MVP (PR gating)
* Compute diff between base/head reachable sink sets
* Produce SARIF with:
* rule id = sink type
* message includes entrypoint + sink + link to evidence JSON
* CI templates + documentation
**Exit criteria**
* In PR checks, the job fails on new EP→sink paths and links to a proof.
---
### Milestone 3 — Taint/dataflow MVP (high-value sinks only)
* Add taint propagation to reduce false positives:
* differentiate “sink reachable” vs “untrusted data reaches sink”
* Add sanitizer recognition
* Add confidence scoring + suppression mechanisms (policy allowlists)
**Exit criteria**
* A sink is only “high severity” if it is both reachable and tainted (or policy says otherwise).
---
### Milestone 4 — VEX integration MVP
* Join reachability with dependency vulnerabilities
* Emit OpenVEX (and/or CycloneDX VEX)
* Store evidence references (paths) inside VEX justification
**Exit criteria**
* For a repo with a vulnerable dependency, you can automatically produce:
* affected/not affected with evidence.
---
### Milestone 5 — Scale and precision improvements
* Incremental analysis (only analyze changed projects/packages)
* Better dynamic dispatch handling (Go pointer analysis, .NET interface dispatch expansion)
* Optional runtime telemetry integration:
* import production traces to prioritize “actually observed” entrypoints
**Exit criteria**
* Works on large services with acceptable run time and stable noise levels.
---
## 5) Backlog you can paste into Jira (epics + key stories)
### Epic: Policy & taxonomy
* Story: Define `smartdiff.policy.yaml` schema and validator
**AC:** invalid configs fail with clear errors; configs are versioned.
* Story: Provide default sink list and severities
**AC:** at least 10 sink rules with test cases.
### Epic: .NET analyzer
* Story: Resolve method invocations to symbols (Roslyn)
**AC:** correct targets for direct calls; conservative handling for virtual calls.
* Story: Discover ASP.NET routes and bind to entrypoint symbols
**AC:** entrypoints include route/method metadata.
### Epic: Go analyzer
* Story: SSA build and callgraph extraction
**AC:** function nodes and edges generated for a multi-package repo.
* Story: net/http entrypoint discovery
**AC:** handler functions recognized as entrypoints with path labels.
### Epic: Reachability engine
* Story: Compute reachable sinks per entrypoint
**AC:** store at least one path with hop list.
* Story: SmartDiff A vs B
**AC:** added/removed paths computed deterministically.
### Epic: CI/SARIF
* Story: Emit SARIF results
**AC:** findings appear in code scanning UI; include file/line.
### Epic: Taint analysis
* Story: Propagate taint from request to sink for 3 sink classes
**AC:** produces “tainted” evidence with a variable/argument trace.
* Story: Sanitizer recognition
**AC:** path marked “sanitized” and downgraded per policy.
### Epic: VEX
* Story: Generate OpenVEX statements from reachability + vuln feed
**AC:** for “not affected” includes justification and evidence references.
---
## 6) Key engineering decisions (recommended defaults)
### Storage
* Start with **Postgres** (or even local sqlite for MVP) for simplicity.
* Introduce a graph DB only if:
* you need very large multi-commit graph queries at low latency
* Postgres performance becomes a hard blocker
### Confidence model
Every edge/path should carry:
* `confidence`: High/Med/Low
* `reasons`: e.g., `DirectCall`, `InterfaceDispatch`, `ReflectionGuess`, `RouterHeuristic`
This lets you:
* gate only on high-confidence paths in early rollout
* keep low-confidence as “informational”
### Suppression model
* Local suppressions:
* `smartdiff.suppress.yaml` with rule id + symbol id + reason + expiry
* Policy allowlists:
* allow file writes only under certain directories
* allow outbound network only to configured domains
---
## 7) Testing strategy (to avoid “cool demo, unusable tool”)
### Unit tests
* Symbol hashing stability tests
* Call resolution tests:
* overloads, generics, interfaces, lambdas
* Policy parsing/validation tests
### Integration tests (must-have)
* Golden repos in `testdata/`:
* one ASP.NET minimal API
* one MVC controller app
* one Go net/http + one Gin app
* Golden outputs:
* expected entrypoints
* expected reachable sinks
* expected diff between commits
### Regression tests
* A curated corpus of “known issues”:
* false positives you fixed should never return
* false negatives: ensure known risky path is always found
### Performance tests
* Measure:
* analysis time per 50k LOC
* memory peak
* graph size
* Budget enforcement:
* if over budget, degrade gracefully (lower precision, mark low confidence)
---
## 8) Example configs and outputs (to make onboarding easy)
### Example policy YAML (minimal)
```yaml
version: 1
service: invoices-api
entrypoints:
autodiscover:
dotnet:
aspnet: true
go:
net_http: true
sinks:
- type: CMD_EXEC
severity: high
match:
dotnet:
symbols:
- "System.Diagnostics.Process.Start(string)"
go:
symbols:
- "os/exec.Command"
- type: FILE_WRITE
severity: medium
match:
dotnet:
namespaces: ["System.IO"]
go:
symbols: ["os.WriteFile"]
gating:
fail_on:
- sink_type: CMD_EXEC
when: "added && confidence >= medium"
- sink_type: FILE_WRITE
when: "added && tainted && confidence >= medium"
```
### Evidence JSON shape (what the UI consumes)
```json
{
"commit": "abc123",
"entrypoint": {"symbol": "InvoicesController.Upload()", "route": "POST /upload"},
"sink": {"type": "FILE_WRITE", "symbol": "System.IO.File.WriteAllBytes"},
"confidence": "high",
"tainted": true,
"path": [
{"symbol": "InvoicesController.Upload()", "file": "Controllers/InvoicesController.cs", "line": 42},
{"symbol": "UploadService.Save()", "file": "Services/UploadService.cs", "line": 18},
{"symbol": "System.IO.File.WriteAllBytes", "file": null, "line": null}
]
}
```
---
## 9) Risks and mitigations (explicit)
1. **Dynamic behavior (reflection, DI, router magic)**
* Mitigation: conservative fallbacks + confidence labels + optional runtime traces later
2. **Noise from huge callgraphs**
* Mitigation: sink-first slicing (compute reachability backwards from sinks), entrypoint scoping, kshortest paths only
3. **Large repo build failures**
* Mitigation: analyzer runs inside build containers; allow partial analysis with explicit “incomplete” result flag
4. **Teams rejecting gating**
* Mitigation: staged rollout:
* Observe-only mode → warn-only → fail-only for high-confidence CMD_EXEC/UNSAFE_DESER
---
## 10) Definition of done (what “implemented” means)
You should consider SmartDiff “implemented” when:
* A repo can enable it with one config + one CI job.
* PRs get:
* a small number of **actionable** findings (not hundreds)
* each finding has a proof path with file/line hops
* It reliably detects at least:
* new command execution paths
* new unsafe deserialization paths
* new tainted filesystem write paths
* It can optionally emit VEX decisions backed by reachability evidence.
---
If you want the next step, I can also give you:
* a **concrete repo layout** with module boundaries,
* the **Postgres schema** (tables + indexes),
* and a **language adapter interface** (so adding Java/Python later is straightforward).