save progress
This commit is contained in:
@@ -1,919 +0,0 @@
|
||||
Here’s a compact, practical way to add two high‑leverage capabilities to your scanner: **DSSE‑signed path witnesses** and **Smart‑Diff × Reachability**—what they are, why they matter, and exactly how to implement them in Stella Ops without ceremony.
|
||||
|
||||
---
|
||||
|
||||
# 1) DSSE‑signed path witnesses (entrypoint → calls → sink)
|
||||
|
||||
**What it is (in plain terms):**
|
||||
When you flag a CVE as “reachable,” also emit a tiny, human‑readable proof: the **exact path** from a real entrypoint (e.g., HTTP route, CLI verb, cron) through functions/methods to the **vulnerable sink**. Wrap that proof in a **DSSE** envelope and sign it. Anyone can verify the witness later—offline—without rerunning analysis.
|
||||
|
||||
**Why it matters:**
|
||||
|
||||
* Turns red flags into **auditable evidence** (quiet‑by‑design).
|
||||
* Lets CI/CD, auditors, and customers **verify** findings independently.
|
||||
* Enables **deterministic replay** and provenance chains (ties nicely to in‑toto/SLSA).
|
||||
|
||||
**Minimal JSON witness (stable, vendor‑neutral):**
|
||||
|
||||
```json
|
||||
{
|
||||
"witness_schema": "stellaops.witness.v1",
|
||||
"artifact": { "sbom_digest": "sha256:...", "component_purl": "pkg:nuget/Example@1.2.3" },
|
||||
"vuln": { "id": "CVE-2024-XXXX", "source": "NVD", "range": "≤1.2.3" },
|
||||
"entrypoint": { "kind": "http", "name": "GET /billing/pay" },
|
||||
"path": [
|
||||
{"symbol": "BillingController.Pay()", "file": "BillingController.cs", "line": 42},
|
||||
{"symbol": "PaymentsService.Authorize()", "file": "PaymentsService.cs", "line": 88},
|
||||
{"symbol": "LibXYZ.Parser.Parse()", "file": "Parser.cs", "line": 17}
|
||||
],
|
||||
"sink": { "symbol": "LibXYZ.Parser.Parse()", "type": "deserialization" },
|
||||
"evidence": {
|
||||
"callgraph_digest": "sha256:...",
|
||||
"build_id": "dotnet:RID:linux-x64:sha256:...",
|
||||
"analysis_config_digest": "sha256:..."
|
||||
},
|
||||
"observed_at": "2025-12-18T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Wrap in DSSE (payloadType & payload are required)**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.witness+json",
|
||||
"payload": "base64(JSON_above)",
|
||||
"signatures": [{ "keyid": "attestor-stellaops-ed25519", "sig": "base64(...)" }]
|
||||
}
|
||||
```
|
||||
|
||||
**.NET 10 signing/verifying (Ed25519)**
|
||||
|
||||
```csharp
|
||||
using System.Security.Cryptography;
|
||||
using System.Text.Json;
|
||||
|
||||
var payloadBytes = JsonSerializer.SerializeToUtf8Bytes(witnessJsonObj);
|
||||
var dsse = new {
|
||||
payloadType = "application/vnd.stellaops.witness+json",
|
||||
payload = Convert.ToBase64String(payloadBytes),
|
||||
signatures = new [] { new { keyid = keyId, sig = Convert.ToBase64String(Sign(payloadBytes, privateKey)) } }
|
||||
};
|
||||
byte[] Sign(byte[] data, byte[] privateKey)
|
||||
{
|
||||
using var ed = new Ed25519();
|
||||
// import private key, sign data (left as your Ed25519 helper)
|
||||
return ed.SignData(data, privateKey);
|
||||
}
|
||||
```
|
||||
|
||||
**Where to emit:**
|
||||
|
||||
* **Scanner.Worker**: after reachability confirms `reachable=true`, emit witness → **Attestor** signs → **Authority** stores (Postgres) → optional Rekor‑style mirror.
|
||||
* Expose `/witness/{findingId}` for download & independent verification.
|
||||
|
||||
---
|
||||
|
||||
# 2) Smart‑Diff × Reachability (incremental, low‑noise updates)
|
||||
|
||||
**What it is:**
|
||||
On **SBOM/VEX/dependency** deltas, don’t rescan everything. Update only **affected regions** of the call graph and recompute reachability **just for changed nodes/edges**.
|
||||
|
||||
**Why it matters:**
|
||||
|
||||
* **Order‑of‑magnitude faster** incremental scans.
|
||||
* Fewer flaky diffs; triage stays focused on **meaningful risk change**.
|
||||
* Perfect for PR gating: “what changed” → “what became reachable/unreachable.”
|
||||
|
||||
**Core idea (graph‑reachability):**
|
||||
|
||||
* Maintain a per‑service **call graph** `G = (V, E)` with **entrypoint set** `S`.
|
||||
* On diff: compute changed nodes/edges ΔV/ΔE.
|
||||
* Run **incremental BFS/DFS** from impacted nodes to sinks (forward or backward), reusing memoized results.
|
||||
* Recompute only **frontiers** touched by Δ.
|
||||
|
||||
**Minimal tables (Postgres):**
|
||||
|
||||
```sql
|
||||
-- Nodes (functions/methods)
|
||||
CREATE TABLE cg_nodes(
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
service TEXT, symbol TEXT, file TEXT, line INT,
|
||||
hash TEXT, UNIQUE(service, hash)
|
||||
);
|
||||
-- Edges (calls)
|
||||
CREATE TABLE cg_edges(
|
||||
src BIGINT REFERENCES cg_nodes(id),
|
||||
dst BIGINT REFERENCES cg_nodes(id),
|
||||
kind TEXT, PRIMARY KEY(src, dst)
|
||||
);
|
||||
-- Entrypoints & Sinks
|
||||
CREATE TABLE cg_entrypoints(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY);
|
||||
CREATE TABLE cg_sinks(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY, sink_type TEXT);
|
||||
|
||||
-- Memoized reachability cache
|
||||
CREATE TABLE cg_reach_cache(
|
||||
entry_id BIGINT, sink_id BIGINT,
|
||||
path JSONB, reachable BOOLEAN,
|
||||
updated_at TIMESTAMPTZ,
|
||||
PRIMARY KEY(entry_id, sink_id)
|
||||
);
|
||||
```
|
||||
|
||||
**Incremental algorithm (pseudocode):**
|
||||
|
||||
```text
|
||||
Input: ΔSBOM, ΔDeps, ΔCode → ΔNodes, ΔEdges
|
||||
1) Apply Δ to cg_nodes/cg_edges
|
||||
2) ImpactSet = neighbors(ΔNodes ∪ endpoints(ΔEdges))
|
||||
3) For each e∈Entrypoints intersect ancestors(ImpactSet):
|
||||
Recompute forward search to affected sinks, stop early on unchanged subgraphs
|
||||
Update cg_reach_cache; if state flips, emit new/updated DSSE witness
|
||||
```
|
||||
|
||||
**.NET 10 reachability sketch (fast & local):**
|
||||
|
||||
```csharp
|
||||
HashSet<int> ImpactSet = ComputeImpact(deltaNodes, deltaEdges);
|
||||
foreach (var e in Intersect(Entrypoints, Ancestors(ImpactSet)))
|
||||
{
|
||||
var res = BoundedReach(e, affectedSinks, graph, cache);
|
||||
foreach (var r in res.Changed)
|
||||
{
|
||||
cache.Upsert(e, r.Sink, r.Path, r.Reachable);
|
||||
if (r.Reachable) EmitDsseWitness(e, r.Sink, r.Path);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CI/PR flow:**
|
||||
|
||||
1. Build → SBOM diff → Dependency diff → Call‑graph delta.
|
||||
2. Run incremental reachability.
|
||||
3. If any `unreachable→reachable` transitions: **fail gate**, attach DSSE witnesses.
|
||||
4. If `reachable→unreachable`: auto‑close prior findings (and archive prior witness).
|
||||
|
||||
---
|
||||
|
||||
# UX hooks (quick wins)
|
||||
|
||||
* In findings list, add a **“Show Witness”** button → modal renders the signed path (entrypoint→…→sink) + **“Verify Signature”** one‑click.
|
||||
* In PR checks, summarize only **state flips** with tiny links: “+2 reachable (view witness)” / “−1 (now unreachable)”.
|
||||
|
||||
---
|
||||
|
||||
# Minimal tasks to get this live
|
||||
|
||||
* **Scanner.Worker**: build call‑graph extraction (per language), add incremental graph store, reachability cache.
|
||||
* **Attestor**: DSSE signing endpoint + key management (Ed25519 by default; PQC mode later).
|
||||
* **Authority**: tables above + witness storage + retrieval API.
|
||||
* **Router/CI plugin**: PR annotation with **state flips** and links to witnesses.
|
||||
* **UI**: witness modal + signature verify.
|
||||
|
||||
If you want, I can draft the exact Postgres migrations, the C# repositories, and a tiny verifier CLI that checks DSSE signatures and prints the call path.
|
||||
Below is a concrete, buildable blueprint for an **advanced reachability analysis engine** inside Stella Ops. I’m going to assume your “Stella Ops” components are roughly:
|
||||
|
||||
* **Scanner.Worker**: runs analyses in CI / on artifacts
|
||||
* **Authority**: stores graphs/findings/witnesses
|
||||
* **Attestor**: signs DSSE envelopes (Ed25519)
|
||||
* (optional) **SurfaceBuilder**: background worker that computes “vuln surfaces” for packages
|
||||
|
||||
The key advance is: **don’t treat a CVE as “a package”**. Treat it as a **set of trigger methods** (public API) that can reach the vulnerable code inside the dependency—computed by “Smart‑Diff” once, reused everywhere.
|
||||
|
||||
---
|
||||
|
||||
## 0) Define the contract (precision/soundness) up front
|
||||
|
||||
If you don’t write this down, you’ll fight false positives/negatives forever.
|
||||
|
||||
### What Stella Ops will guarantee (first release)
|
||||
|
||||
* **Whole-program static call graph** (app + selected dependency assemblies)
|
||||
* **Context-insensitive** (fast), **path witness** extracted (shortest path)
|
||||
* **Dynamic dispatch handled** with CHA/RTA (+ DI hints), with explicit uncertainty flags
|
||||
* **Reflection handled best-effort** (constant-string resolution), otherwise “unknown edge”
|
||||
|
||||
### What it will NOT guarantee (first release)
|
||||
|
||||
* Perfect handling of reflection / `dynamic` / runtime codegen
|
||||
* Perfect delegate/event resolution across complex flows
|
||||
* Full taint/dataflow reachability (you can add later)
|
||||
|
||||
This is fine. The major value is: “**we can show you the call path**” and “**we can prove the vuln is triggered by calling these library APIs**”.
|
||||
|
||||
---
|
||||
|
||||
## 1) The big idea: “Vuln surfaces” (Smart-Diff → triggers)
|
||||
|
||||
### Problem
|
||||
|
||||
CVE feeds typically say “package X version range Y is vulnerable” but rarely say *which methods*. If you only do package-level reachability, noise is huge.
|
||||
|
||||
### Solution
|
||||
|
||||
For each CVE+package, compute a **vulnerability surface**:
|
||||
|
||||
* **Candidate sinks** = methods changed between vulnerable and fixed versions (diff at IL level)
|
||||
* **Trigger methods** = *public/exported* methods in the vulnerable version that can reach those changed methods internally
|
||||
|
||||
Then your service scan becomes:
|
||||
|
||||
> “Can any entrypoint reach any trigger method?”
|
||||
|
||||
This is both faster and more precise.
|
||||
|
||||
---
|
||||
|
||||
## 2) Data model (Authority / Postgres)
|
||||
|
||||
You already had call graph tables; here’s a concrete schema that supports:
|
||||
|
||||
* graph snapshots
|
||||
* incremental updates
|
||||
* vuln surfaces
|
||||
* reachability cache
|
||||
* DSSE witnesses
|
||||
|
||||
### 2.1 Graph tables
|
||||
|
||||
```sql
|
||||
CREATE TABLE cg_snapshots (
|
||||
snapshot_id BIGSERIAL PRIMARY KEY,
|
||||
service TEXT NOT NULL,
|
||||
build_id TEXT NOT NULL,
|
||||
graph_digest TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(service, build_id)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_nodes (
|
||||
node_id BIGSERIAL PRIMARY KEY,
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
method_key TEXT NOT NULL, -- stable key (see below)
|
||||
asm_name TEXT,
|
||||
type_name TEXT,
|
||||
method_name TEXT,
|
||||
file_path TEXT,
|
||||
line_start INT,
|
||||
il_hash TEXT, -- normalized IL hash for diffing
|
||||
flags INT NOT NULL DEFAULT 0, -- bitflags: has_reflection, compiler_generated, etc.
|
||||
UNIQUE(snapshot_id, method_key)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_edges (
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
src_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
dst_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
kind SMALLINT NOT NULL, -- 0=call,1=newobj,2=dispatch,3=delegate,4=reflection_guess,...
|
||||
PRIMARY KEY(snapshot_id, src_node_id, dst_node_id, kind)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_entrypoints (
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
kind TEXT NOT NULL, -- http, grpc, cli, job, etc.
|
||||
name TEXT NOT NULL, -- GET /foo, "Main", etc.
|
||||
PRIMARY KEY(snapshot_id, node_id, kind, name)
|
||||
);
|
||||
```
|
||||
|
||||
### 2.2 Vuln surface tables (Smart‑Diff artifacts)
|
||||
|
||||
```sql
|
||||
CREATE TABLE vuln_surfaces (
|
||||
surface_id BIGSERIAL PRIMARY KEY,
|
||||
ecosystem TEXT NOT NULL, -- nuget
|
||||
package TEXT NOT NULL,
|
||||
cve_id TEXT NOT NULL,
|
||||
vuln_version TEXT NOT NULL, -- a representative vulnerable version
|
||||
fixed_version TEXT NOT NULL,
|
||||
surface_digest TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(ecosystem, package, cve_id, vuln_version, fixed_version)
|
||||
);
|
||||
|
||||
CREATE TABLE vuln_surface_sinks (
|
||||
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
|
||||
sink_method_key TEXT NOT NULL,
|
||||
reason TEXT NOT NULL, -- changed|added|removed|heuristic
|
||||
PRIMARY KEY(surface_id, sink_method_key)
|
||||
);
|
||||
|
||||
CREATE TABLE vuln_surface_triggers (
|
||||
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
|
||||
trigger_method_key TEXT NOT NULL,
|
||||
sink_method_key TEXT NOT NULL,
|
||||
internal_path JSONB, -- optional: library internal witness path
|
||||
PRIMARY KEY(surface_id, trigger_method_key, sink_method_key)
|
||||
);
|
||||
```
|
||||
|
||||
### 2.3 Reachability cache & witnesses
|
||||
|
||||
```sql
|
||||
CREATE TABLE reach_findings (
|
||||
finding_id BIGSERIAL PRIMARY KEY,
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
cve_id TEXT NOT NULL,
|
||||
ecosystem TEXT NOT NULL,
|
||||
package TEXT NOT NULL,
|
||||
package_version TEXT NOT NULL,
|
||||
reachable BOOLEAN NOT NULL,
|
||||
reachable_entrypoints INT NOT NULL DEFAULT 0,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(snapshot_id, cve_id, package, package_version)
|
||||
);
|
||||
|
||||
CREATE TABLE reach_witnesses (
|
||||
witness_id BIGSERIAL PRIMARY KEY,
|
||||
finding_id BIGINT REFERENCES reach_findings(finding_id) ON DELETE CASCADE,
|
||||
entry_node_id BIGINT REFERENCES cg_nodes(node_id),
|
||||
dsse_envelope JSONB NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3) Stable identity: MethodKey + IL hash
|
||||
|
||||
### 3.1 MethodKey (must be stable across builds)
|
||||
|
||||
Use a normalized string like:
|
||||
|
||||
```
|
||||
{AssemblyName}|{DeclaringTypeFullName}|{MethodName}`{GenericArity}({ParamType1},{ParamType2},...)
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
* `MyApp|BillingController|Pay(System.String)`
|
||||
* `LibXYZ|LibXYZ.Parser|Parse(System.ReadOnlySpan<System.Byte>)`
|
||||
|
||||
### 3.2 Normalized IL hash (for smart-diff + incremental graph updates)
|
||||
|
||||
Raw IL bytes aren’t stable (metadata tokens change). Normalize:
|
||||
|
||||
* opcode names
|
||||
* branch targets by *instruction index*, not offset
|
||||
* method operands by **resolved MethodKey**
|
||||
* string operands by literal or hashed literal
|
||||
* type operands by full name
|
||||
|
||||
Then hash `SHA256(normalized_bytes)`.
|
||||
|
||||
---
|
||||
|
||||
## 4) Call graph extraction for .NET (concrete, doable)
|
||||
|
||||
### Tooling choice
|
||||
|
||||
Start with **Mono.Cecil** (MIT license, easy IL traversal). You can later swap to `System.Reflection.Metadata` for speed.
|
||||
|
||||
### 4.1 Build process (Scanner.Worker)
|
||||
|
||||
1. `dotnet restore` (use your locked restore)
|
||||
2. `dotnet build -c Release /p:DebugType=portable /p:DebugSymbols=true`
|
||||
3. Collect:
|
||||
|
||||
* app assemblies: `bin/Release/**/publish/*.dll` or build output
|
||||
* `.pdb` files for sequence points (file/line for witnesses)
|
||||
|
||||
### 4.2 Cecil loader
|
||||
|
||||
```csharp
|
||||
var rp = new ReaderParameters {
|
||||
ReadSymbols = true,
|
||||
SymbolReaderProvider = new PortablePdbReaderProvider()
|
||||
};
|
||||
|
||||
var asm = AssemblyDefinition.ReadAssembly(dllPath, rp);
|
||||
```
|
||||
|
||||
### 4.3 Node extraction (methods)
|
||||
|
||||
Walk all types, including nested:
|
||||
|
||||
```csharp
|
||||
IEnumerable<TypeDefinition> AllTypes(ModuleDefinition m)
|
||||
{
|
||||
var stack = new Stack<TypeDefinition>(m.Types);
|
||||
while (stack.Count > 0)
|
||||
{
|
||||
var t = stack.Pop();
|
||||
yield return t;
|
||||
foreach (var nt in t.NestedTypes) stack.Push(nt);
|
||||
}
|
||||
}
|
||||
|
||||
foreach (var type in AllTypes(asm.MainModule))
|
||||
foreach (var method in type.Methods)
|
||||
{
|
||||
var key = MethodKey.From(method); // your normalizer
|
||||
var (file, line) = PdbFirstSequencePoint(method);
|
||||
var ilHash = method.HasBody ? ILFingerprint(method) : null;
|
||||
|
||||
// store node (method_key, file, line, il_hash, flags...)
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 Edge extraction (direct calls)
|
||||
|
||||
```csharp
|
||||
foreach (var method in type.Methods.Where(m => m.HasBody))
|
||||
{
|
||||
var srcKey = MethodKey.From(method);
|
||||
foreach (var ins in method.Body.Instructions)
|
||||
{
|
||||
if (ins.Operand is MethodReference mr)
|
||||
{
|
||||
if (ins.OpCode.Code is Code.Call or Code.Callvirt or Code.Newobj)
|
||||
{
|
||||
var dstKey = MethodKey.From(mr); // important: stable even if not resolved
|
||||
edges.Add(new Edge(srcKey, dstKey, kind: CallKind.Direct));
|
||||
}
|
||||
if (ins.OpCode.Code is Code.Ldftn or Code.Ldvirtftn)
|
||||
{
|
||||
// delegate capture (handle later)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5) Advanced precision: dynamic dispatch + DI + async/await
|
||||
|
||||
If you stop at direct edges only, you’ll miss many real paths.
|
||||
|
||||
### 5.1 Async/await mapping (critical for readable witnesses)
|
||||
|
||||
Async methods compile into a state machine `MoveNext()`. You want edges attributed back to the original method.
|
||||
|
||||
In Cecil:
|
||||
|
||||
* Check `AsyncStateMachineAttribute` on a method
|
||||
* It references a state machine type
|
||||
* Find that type’s `MoveNext` method
|
||||
* Map `MoveNextKey -> OriginalMethodKey`
|
||||
|
||||
Then, while extracting edges:
|
||||
|
||||
```csharp
|
||||
srcKey = MoveNextToOriginal.TryGetValue(srcKey, out var original) ? original : srcKey;
|
||||
```
|
||||
|
||||
Do the same for iterator state machines.
|
||||
|
||||
### 5.2 Virtual/interface dispatch (CHA/RTA)
|
||||
|
||||
You need 2 maps:
|
||||
|
||||
1. **type hierarchy / interface impl map**
|
||||
2. **override map** from “declared method” → “implementation method(s)”
|
||||
|
||||
**Build override map**
|
||||
|
||||
```csharp
|
||||
// For each method, Cecil exposes method.Overrides for explicit implementations.
|
||||
overrideMap[MethodKey.From(overrideRef)] = MethodKey.From(methodDef);
|
||||
```
|
||||
|
||||
**CHA**: for callvirt to virtual method `T.M`, add edges to overrides in derived classes
|
||||
**RTA**: restrict to derived classes that are actually instantiated.
|
||||
|
||||
How to get instantiated types:
|
||||
|
||||
* look for `newobj` instructions and add the created type to `InstantiatedTypes`
|
||||
* plus DI registrations (below)
|
||||
|
||||
### 5.3 DI hints (Microsoft.Extensions.DependencyInjection)
|
||||
|
||||
You will see calls like:
|
||||
|
||||
* `ServiceCollectionServiceExtensions.AddTransient<TService, TImpl>(...)`
|
||||
|
||||
In IL these are generic method calls. Detect and record `TService -> TImpl` as “instantiated”. This massively improves RTA for modern .NET apps.
|
||||
|
||||
### 5.4 Delegates/lambdas (good enough approach)
|
||||
|
||||
Implement intraprocedural tracking:
|
||||
|
||||
* when you see `ldftn SomeMethod` then `newobj Action::.ctor` then `stloc.s X`
|
||||
* store `delegateTargets[local X] += SomeMethod`
|
||||
* when you see `ldloc.s X` and later `callvirt Invoke`, add edges to targets
|
||||
|
||||
This makes Minimal API entrypoint discovery work too.
|
||||
|
||||
### 5.5 Reflection (best-effort)
|
||||
|
||||
Implement only high-signal heuristics:
|
||||
|
||||
* `typeof(T).GetMethod("Foo")` with constant "Foo"
|
||||
* `GetType().GetMethod("Foo")` with constant "Foo" (type unknown → mark uncertain)
|
||||
|
||||
If resolved, add edge with `kind=reflection_guess`.
|
||||
If not, set node flag `has_reflection = true` and in results show “may be incomplete”.
|
||||
|
||||
---
|
||||
|
||||
## 6) Entrypoint detection (concrete detectors)
|
||||
|
||||
### 6.1 MVC controllers
|
||||
|
||||
Detect:
|
||||
|
||||
* types deriving from `Microsoft.AspNetCore.Mvc.ControllerBase`
|
||||
* methods:
|
||||
|
||||
* public
|
||||
* not `[NonAction]`
|
||||
* has `[HttpGet]`, `[HttpPost]`, `[Route]` etc.
|
||||
|
||||
Extract route template from attributes’ ctor arguments.
|
||||
|
||||
Store in `cg_entrypoints`:
|
||||
|
||||
* kind = `http`
|
||||
* name = `GET /billing/pay` (compose verb+template)
|
||||
|
||||
### 6.2 Minimal APIs
|
||||
|
||||
Scan `Program.Main` IL:
|
||||
|
||||
* find calls to `MapGet`, `MapPost`, ...
|
||||
* extract route string from preceding `ldstr`
|
||||
* resolve handler method via delegate tracking (ldftn)
|
||||
|
||||
Entry:
|
||||
|
||||
* kind = `http`
|
||||
* name = `GET /foo`
|
||||
|
||||
### 6.3 CLI
|
||||
|
||||
Find assembly entry point method (`asm.EntryPoint`) or `static Main`.
|
||||
Entry:
|
||||
|
||||
* kind = `cli`
|
||||
* name = `Main`
|
||||
|
||||
Start here. Add gRPC/jobs later.
|
||||
|
||||
---
|
||||
|
||||
## 7) Smart-Diff SurfaceBuilder (the “advanced” part)
|
||||
|
||||
This is what makes your reachability actually meaningful for CVEs.
|
||||
|
||||
### 7.1 SurfaceBuilder inputs
|
||||
|
||||
From your vuln ingestion pipeline:
|
||||
|
||||
* ecosystem = nuget
|
||||
* package = `LibXYZ`
|
||||
* affected range = `<= 1.2.3`
|
||||
* fixed version = `1.2.4`
|
||||
* CVE id
|
||||
|
||||
### 7.2 Choose a vulnerable version to diff
|
||||
|
||||
Pick the **highest affected version below fixed**.
|
||||
|
||||
* fixed = 1.2.4
|
||||
* vulnerable representative = 1.2.3
|
||||
|
||||
(If multiple fixed versions exist, build multiple surfaces.)
|
||||
|
||||
### 7.3 Download both packages
|
||||
|
||||
Use NuGet.Protocol to download `.nupkg`, unzip, pick TFMs you care about (often `netstandard2.0` is safest). Compute fingerprints for each assembly.
|
||||
|
||||
### 7.4 Compute method fingerprints
|
||||
|
||||
For each method:
|
||||
|
||||
* MethodKey
|
||||
* Normalized IL hash
|
||||
|
||||
### 7.5 Diff
|
||||
|
||||
```
|
||||
ChangedMethods = { k | hashVuln[k] != hashFixed[k] } ∪ added ∪ removed
|
||||
```
|
||||
|
||||
Store these as `vuln_surface_sinks` with reason.
|
||||
|
||||
### 7.6 Build internal library call graph
|
||||
|
||||
Same Cecil extraction, but only for package assemblies.
|
||||
Now compute triggers:
|
||||
|
||||
**Reverse BFS from sinks**:
|
||||
|
||||
* Start from all sink method keys
|
||||
* Walk predecessors
|
||||
* When you encounter a **public/exported method**, record it as a trigger
|
||||
|
||||
Also store one internal path for each trigger → sink (for witnesses).
|
||||
|
||||
### 7.7 Add interface/base declarations as triggers
|
||||
|
||||
Important: your app might call a library via an interface method signature, not the concrete implementation.
|
||||
|
||||
For each trigger implementation method:
|
||||
|
||||
* for each `method.Overrides` entry, add the overridden method key as an additional trigger
|
||||
|
||||
This reduces dependence on perfect dispatch expansion during app scanning.
|
||||
|
||||
### 7.8 Persist the surface
|
||||
|
||||
Store:
|
||||
|
||||
* sinks set
|
||||
* triggers set
|
||||
* internal witness paths (optional but highly valuable)
|
||||
|
||||
Now you’ve converted a “version range” CVE into “these specific library APIs are dangerous”.
|
||||
|
||||
---
|
||||
|
||||
## 8) Reachability engine (fast, witness-producing)
|
||||
|
||||
### 8.1 In-memory graph format (CSR)
|
||||
|
||||
Don’t BFS off dictionaries; you’ll die on perf.
|
||||
|
||||
Build integer indices:
|
||||
|
||||
* `method_key -> nodeIndex (0..N-1)`
|
||||
* store arrays:
|
||||
|
||||
* `predOffsets[N+1]`
|
||||
* `preds[edgeCount]`
|
||||
|
||||
Construction:
|
||||
|
||||
1. count predecessors per node
|
||||
2. prefix sum to offsets
|
||||
3. fill preds
|
||||
|
||||
### 8.2 Reverse BFS from sinks
|
||||
|
||||
This computes:
|
||||
|
||||
* `visited[node]` = can reach a sink
|
||||
* `parent[node]` = next node toward a sink (for path reconstruction)
|
||||
|
||||
```csharp
|
||||
public sealed class ReachabilityEngine
|
||||
{
|
||||
public ReachabilityResult Compute(
|
||||
Graph g,
|
||||
ReadOnlySpan<int> entrypoints,
|
||||
ReadOnlySpan<int> sinks)
|
||||
{
|
||||
var visitedMark = g.VisitMark; // int[] length N (reused across runs)
|
||||
var parent = g.Parent; // int[] length N (reused)
|
||||
g.RunId++;
|
||||
|
||||
var q = new IntQueue(capacity: g.NodeCount);
|
||||
var sinkSet = new BitSet(g.NodeCount);
|
||||
foreach (var s in sinks)
|
||||
{
|
||||
sinkSet.Set(s);
|
||||
visitedMark[s] = g.RunId;
|
||||
parent[s] = s;
|
||||
q.Enqueue(s);
|
||||
}
|
||||
|
||||
while (q.TryDequeue(out var v))
|
||||
{
|
||||
var start = g.PredOffsets[v];
|
||||
var end = g.PredOffsets[v + 1];
|
||||
for (int i = start; i < end; i++)
|
||||
{
|
||||
var p = g.Preds[i];
|
||||
if (visitedMark[p] == g.RunId) continue;
|
||||
visitedMark[p] = g.RunId;
|
||||
parent[p] = v;
|
||||
q.Enqueue(p);
|
||||
}
|
||||
}
|
||||
|
||||
// Collect reachable entrypoints and paths
|
||||
var results = new List<EntryWitness>();
|
||||
foreach (var e in entrypoints)
|
||||
{
|
||||
if (visitedMark[e] != g.RunId) continue;
|
||||
var path = ReconstructPath(e, parent, sinkSet);
|
||||
results.Add(new EntryWitness(e, path));
|
||||
}
|
||||
|
||||
return new ReachabilityResult(results);
|
||||
}
|
||||
|
||||
private static int[] ReconstructPath(int entry, int[] parent, BitSet sinks)
|
||||
{
|
||||
var path = new List<int>(32);
|
||||
int cur = entry;
|
||||
path.Add(cur);
|
||||
|
||||
// follow parent pointers until a sink
|
||||
for (int guard = 0; guard < 10_000; guard++)
|
||||
{
|
||||
if (sinks.Get(cur)) break;
|
||||
var nxt = parent[cur];
|
||||
if (nxt == cur || nxt < 0) break; // safety
|
||||
cur = nxt;
|
||||
path.Add(cur);
|
||||
}
|
||||
return path.ToArray();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8.3 Producing the witness
|
||||
|
||||
For each node index in the path:
|
||||
|
||||
* method_key
|
||||
* file_path / line_start (if known)
|
||||
* optional flags (reflection_guess edge, dispatch edge)
|
||||
|
||||
Then attach:
|
||||
|
||||
* vuln id, package, version
|
||||
* entrypoint kind/name
|
||||
* graph digest + config digest
|
||||
* surface digest
|
||||
* timestamp
|
||||
|
||||
Send JSON to Attestor for DSSE signing, store envelope in Authority.
|
||||
|
||||
---
|
||||
|
||||
## 9) Scaling: don’t do BFS 500 times if you can avoid it
|
||||
|
||||
### 9.1 First-line scaling (usually enough)
|
||||
|
||||
* Group vulnerabilities by package/version → surfaces reused
|
||||
* Only run reachability for vulns where:
|
||||
|
||||
* dependency present AND
|
||||
* surface exists OR fallback mode
|
||||
* Limit witnesses per vuln (top 3)
|
||||
|
||||
In practice, with N~50k nodes and E~200k edges, a reverse BFS is fast in C# if done with arrays.
|
||||
|
||||
### 9.2 Incremental Smart-Diff × Reachability (your “low noise” killer feature)
|
||||
|
||||
#### Step A: compute graph delta between snapshots
|
||||
|
||||
Use `il_hash` per method to detect changed nodes:
|
||||
|
||||
* added / removed / changed nodes
|
||||
* edges updated only for changed nodes
|
||||
|
||||
#### Step B: decide which vulnerabilities need recompute
|
||||
|
||||
Store a cached reverse-reachable set per vuln surface if you want (bitset), OR just do a cheaper heuristic:
|
||||
|
||||
Recompute for vulnerability if:
|
||||
|
||||
* sink set changed (new surface or version changed), OR
|
||||
* any changed node is on any previously stored witness path, OR
|
||||
* entrypoints changed, OR
|
||||
* impacted nodes touch any trigger node’s predecessors (use a small localized search)
|
||||
|
||||
A practical approach:
|
||||
|
||||
* store all node IDs that appear in any witness path for that vuln
|
||||
* if delta touches any of those nodes/edges, recompute
|
||||
* otherwise reuse cached result
|
||||
|
||||
This yields a massive win on PR scans where most code is unchanged.
|
||||
|
||||
#### Step C: “Impact frontier” recompute (optional)
|
||||
|
||||
If you want more advanced:
|
||||
|
||||
* compute `ImpactSet = ΔNodes ∪ endpoints(ΔEdges)`
|
||||
* run reverse BFS **starting from ImpactSet ∩ ReverseReachSet** and update visited marks
|
||||
This is trickier to implement correctly (dynamic graph), so I’d ship the heuristic first.
|
||||
|
||||
---
|
||||
|
||||
## 10) Practical fallback modes (don’t block shipping)
|
||||
|
||||
You won’t have surfaces for every CVE on day 1. Handle this gracefully:
|
||||
|
||||
### Mode 1: Surface-based reachability (best)
|
||||
|
||||
* sink = trigger methods from surface
|
||||
* result: “reachable” with path
|
||||
|
||||
### Mode 2: Package API usage (good fallback)
|
||||
|
||||
* sink = *any* method in that package that is called by app
|
||||
* result: “package reachable” (lower confidence), still provide path to callsite
|
||||
|
||||
### Mode 3: Dependency present only (SBOM level)
|
||||
|
||||
* no call graph needed
|
||||
* result: “present” only
|
||||
|
||||
Your UI can show confidence tiers:
|
||||
|
||||
* **Confirmed reachable (surface)**
|
||||
* **Likely reachable (package API)**
|
||||
* **Present only (SBOM)**
|
||||
|
||||
---
|
||||
|
||||
## 11) Integration points inside Stella Ops
|
||||
|
||||
### Scanner.Worker (per build)
|
||||
|
||||
1. Build/collect assemblies + pdb
|
||||
2. `CallGraphBuilder` → nodes/edges/entrypoints + graph_digest
|
||||
3. Load SBOM vulnerabilities list
|
||||
4. For each vuln:
|
||||
|
||||
* resolve surface triggers; if missing → enqueue SurfaceBuilder job + fallback mode
|
||||
* run reachability BFS
|
||||
* for each reachable entrypoint: emit DSSE witness
|
||||
5. Persist findings/witnesses
|
||||
|
||||
### SurfaceBuilder (async worker)
|
||||
|
||||
* triggered by “surface missing” events or nightly preload of top packages
|
||||
* computes surface once, stores forever
|
||||
|
||||
### Authority
|
||||
|
||||
* stores graphs, surfaces, findings, witnesses
|
||||
* provides retrieval APIs for UI/CI
|
||||
|
||||
---
|
||||
|
||||
## 12) What to implement first (in the order that produces value fastest)
|
||||
|
||||
### Week 1–2 scope (realistic, shippable)
|
||||
|
||||
1. Cecil call graph extraction (direct calls)
|
||||
2. MVC + Minimal API entrypoints
|
||||
3. Reverse BFS reachability with path witnesses
|
||||
4. DSSE witness signing + storage
|
||||
5. SurfaceBuilder v1:
|
||||
|
||||
* IL hash per method
|
||||
* changed methods as sinks
|
||||
* triggers via internal reverse BFS
|
||||
6. UI: “Show Witness” + “Verify Signature”
|
||||
|
||||
### Next increment (precision upgrades)
|
||||
|
||||
7. async/await mapping to original methods
|
||||
8. RTA + DI registration hints
|
||||
9. delegate tracking for Minimal API handlers (if not already)
|
||||
10. interface override triggers in surface builder
|
||||
|
||||
### Later (if you want “attackability”, not just “reachability”)
|
||||
|
||||
11. taint/dataflow for top sink classes (deserialization, path traversal, SQL, command exec)
|
||||
12. sanitizer modeling & parameter constraints
|
||||
|
||||
---
|
||||
|
||||
## 13) Common failure modes and how to harden
|
||||
|
||||
### MethodKey mismatches (surface vs app call)
|
||||
|
||||
* Ensure both are generated from the same normalization rules
|
||||
* For generic methods, prefer **definition** keys (strip instantiation)
|
||||
* Store both “exact” and “erased generic” variants if needed
|
||||
|
||||
### Multi-target frameworks
|
||||
|
||||
* SurfaceBuilder: compute triggers for each TFM, union them
|
||||
* App scan: choose TFM closest to build RID, but allow fallback to union
|
||||
|
||||
### Huge graphs
|
||||
|
||||
* Drop `System.*` nodes/edges unless:
|
||||
|
||||
* the vuln is in System.* (rare, but handle separately)
|
||||
* Deduplicate nodes by MethodKey across assemblies where safe
|
||||
* Use CSR arrays + pooled queues
|
||||
|
||||
### Reflection heavy projects
|
||||
|
||||
* Mark analysis confidence lower
|
||||
* Include “unknown edges present” in finding metadata
|
||||
* Still produce a witness path up to the reflective callsite
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also paste a **complete Cecil-based CallGraphBuilder class** (nodes+edges+PDB lines), plus the **SurfaceBuilder** that downloads NuGet packages and generates `vuln_surface_triggers` end-to-end.
|
||||
@@ -1,869 +0,0 @@
|
||||
Here’s a compact, practical blueprint for bringing **EPSS** into your stack without chaos: a **3‑layer ingestion model** that keeps raw data, produces clean probabilities, and emits “signal‑ready” events your risk engine can use immediately.
|
||||
|
||||
---
|
||||
|
||||
# Why this matters (super short)
|
||||
|
||||
* **EPSS** = predicted probability a vuln will be exploited soon.
|
||||
* Mixing “raw EPSS feed” directly into decisions makes audits, rollbacks, and model upgrades painful.
|
||||
* A **layered model** lets you **version probability evolution**, compare vendors, and train **meta‑predictors on deltas** (how risk changes over time), not just on snapshots.
|
||||
|
||||
---
|
||||
|
||||
# The three layers (and how they map to Stella Ops)
|
||||
|
||||
1. **Raw feed layer (immutable)**
|
||||
|
||||
* **Goal:** Store exactly what the provider sent (EPSS v4 CSV/JSON, schema drift and all).
|
||||
* **Stella modules:** `Concelier` (preserve‑prune source) writes; `Authority` handles signatures/hashes.
|
||||
* **Storage:** `postgres.epss_raw` (partitioned by day); blob column for the untouched payload; SHA‑256 of source file.
|
||||
* **Why:** Full provenance + deterministic replay.
|
||||
|
||||
2. **Normalized probabilistic layer**
|
||||
|
||||
* **Goal:** Clean, typed tables keyed by `cve_id`, with **probability, percentile, model_version, asof_ts**.
|
||||
* **Stella modules:** `Excititor` (transform); `Policy Engine` reads.
|
||||
* **Storage:** `postgres.epss_prob` with a **surrogate key** `(cve_id, model_version, asof_ts)` and computed **delta fields** vs previous `asof_ts`.
|
||||
* **Extras:** Keep optional vendor columns (e.g., FIRST, custom regressors) to compare models side‑by‑side.
|
||||
|
||||
3. **Signal‑ready layer (risk engine contracts)**
|
||||
|
||||
* **Goal:** Pre‑chewed “events” your **Signals/Router** can route instantly.
|
||||
* **What’s inside:** Only the fields needed for gating and UI: `cve_id`, `prob_now`, `prob_delta`, `percentile`, `risk_band`, `explain_hash`.
|
||||
* **Emit:** `first_signal`, `risk_increase`, `risk_decrease`, `quieted` with **idempotent event keys**.
|
||||
* **Stella modules:** `Signals` publishes, `Router` fan‑outs, `Timeline` records; `Notify` handles subscriptions.
|
||||
|
||||
---
|
||||
|
||||
# Minimal Postgres schema (ready to paste)
|
||||
|
||||
```sql
|
||||
-- 1) Raw (immutable)
|
||||
create table epss_raw (
|
||||
id bigserial primary key,
|
||||
source_uri text not null,
|
||||
ingestion_ts timestamptz not null default now(),
|
||||
asof_date date not null,
|
||||
payload jsonb not null,
|
||||
payload_sha256 bytea not null
|
||||
);
|
||||
create index on epss_raw (asof_date);
|
||||
|
||||
-- 2) Normalized
|
||||
create table epss_prob (
|
||||
id bigserial primary key,
|
||||
cve_id text not null,
|
||||
model_version text not null, -- e.g., 'EPSS-4.0-Falcon-2025-12'
|
||||
asof_ts timestamptz not null,
|
||||
probability double precision not null,
|
||||
percentile double precision,
|
||||
features jsonb, -- optional: normalized features used
|
||||
unique (cve_id, model_version, asof_ts)
|
||||
);
|
||||
-- delta against prior point (materialized view or nightly job)
|
||||
create materialized view epss_prob_delta as
|
||||
select p.*,
|
||||
p.probability - lag(p.probability) over (partition by cve_id, model_version order by asof_ts) as prob_delta
|
||||
from epss_prob p;
|
||||
|
||||
-- 3) Signal-ready
|
||||
create table epss_signal (
|
||||
signal_id bigserial primary key,
|
||||
cve_id text not null,
|
||||
asof_ts timestamptz not null,
|
||||
probability double precision not null,
|
||||
prob_delta double precision,
|
||||
risk_band text not null, -- e.g., 'LOW/MED/HIGH/CRITICAL'
|
||||
model_version text not null,
|
||||
explain_hash bytea not null, -- hash of inputs -> deterministic
|
||||
unique (cve_id, model_version, asof_ts)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# C# ingestion skeleton (StellaOps.Scanner.Worker.DotNet style)
|
||||
|
||||
```csharp
|
||||
// 1) Fetch & store raw (Concelier)
|
||||
public async Task IngestRawAsync(Uri src, DateOnly asOfDate) {
|
||||
var bytes = await http.GetByteArrayAsync(src);
|
||||
var sha = SHA256.HashData(bytes);
|
||||
await pg.ExecuteAsync(
|
||||
"insert into epss_raw(source_uri, asof_date, payload, payload_sha256) values (@u,@d,@p::jsonb,@s)",
|
||||
new { u = src.ToString(), d = asOfDate, p = Encoding.UTF8.GetString(bytes), s = sha });
|
||||
}
|
||||
|
||||
// 2) Normalize (Excititor)
|
||||
public async Task NormalizeAsync(DateOnly asOfDate, string modelVersion) {
|
||||
var raws = await pg.QueryAsync<(string Payload)>("select payload from epss_raw where asof_date=@d", new { d = asOfDate });
|
||||
foreach (var r in raws) {
|
||||
foreach (var row in ParseCsvOrJson(r.Payload)) {
|
||||
await pg.ExecuteAsync(
|
||||
@"insert into epss_prob(cve_id, model_version, asof_ts, probability, percentile, features)
|
||||
values (@cve,@mv,@ts,@prob,@pct,@feat)
|
||||
on conflict do nothing",
|
||||
new { cve = row.Cve, mv = modelVersion, ts = row.AsOf, prob = row.Prob, pct = row.Pctl, feat = row.Features });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3) Emit signal-ready (Signals)
|
||||
public async Task EmitSignalsAsync(string modelVersion, double deltaThreshold) {
|
||||
var rows = await pg.QueryAsync(@"select cve_id, asof_ts, probability,
|
||||
probability - lag(probability) over (partition by cve_id, model_version order by asof_ts) as prob_delta
|
||||
from epss_prob where model_version=@mv", new { mv = modelVersion });
|
||||
|
||||
foreach (var r in rows) {
|
||||
var band = Band(r.probability); // map to LOW/MED/HIGH/CRITICAL
|
||||
if (Math.Abs(r.prob_delta ?? 0) >= deltaThreshold) {
|
||||
var explainHash = DeterministicExplainHash(r);
|
||||
await pg.ExecuteAsync(@"insert into epss_signal
|
||||
(cve_id, asof_ts, probability, prob_delta, risk_band, model_version, explain_hash)
|
||||
values (@c,@t,@p,@d,@b,@mv,@h)
|
||||
on conflict do nothing",
|
||||
new { c = r.cve_id, t = r.asof_ts, p = r.probability, d = r.prob_delta, b = band, mv = modelVersion, h = explainHash });
|
||||
|
||||
await bus.PublishAsync("risk.epss.delta", new {
|
||||
cve = r.cve_id, ts = r.asof_ts, prob = r.probability, delta = r.prob_delta, band, model = modelVersion, explain = Convert.ToHexString(explainHash)
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Versioning & experiments (the secret sauce)
|
||||
|
||||
* **Model namespace:** `EPSS‑4.0‑<regressor‑name>‑<date>` so you can run multiple variants in parallel.
|
||||
* **Delta‑training:** Train a small meta‑predictor on **Δprobability** to forecast **“risk jumps in next N days.”**
|
||||
* **A/B in production:** Route `model_version=x` to 50% of projects; compare **MTTA to patch** and **false‑alarm rate**.
|
||||
|
||||
---
|
||||
|
||||
# Policy & UI wiring (quick contracts)
|
||||
|
||||
**Policy gates** (OPA/Rego or internal rules):
|
||||
|
||||
* Block if `risk_band ∈ {HIGH, CRITICAL}` **AND** `prob_delta >= 0.1` in last 72h.
|
||||
* Soften if asset not reachable or mitigated by VEX.
|
||||
|
||||
**UI (Evidence pane):**
|
||||
|
||||
* Show **sparkline of EPSS over time**, highlight last Δ.
|
||||
* “Why now?” button reveals **explain_hash** → deterministic evidence payload.
|
||||
|
||||
---
|
||||
|
||||
# Ops & reliability
|
||||
|
||||
* Daily ingestion with **idempotent** runs (raw SHA guard).
|
||||
* Backfills: re‑normalize from `epss_raw` for any new model without re‑downloading.
|
||||
* **Deterministic replay:** export `(raw, transform code hash, model_version)` alongside results.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can drop this as a ready‑to‑run **.sql + .csproj** seed with a tiny CLI (`ingest`, `normalize`, `emit`) tailored to your `Postgres + Valkey` profile.
|
||||
Below is a “do this, then this” implementation guide for a **layered EPSS pipeline** inside **Stella Ops**, with concrete schemas, job boundaries, idempotency rules, and the tricky edge cases (model-version shifts, noise control, backfills).
|
||||
|
||||
I’ll assume:
|
||||
|
||||
* **Postgres** is your system of record, **Valkey** is available for caching,
|
||||
* you run **.NET workers** (like `StellaOps.Scanner.Worker.DotNet`),
|
||||
* Stella modules you referenced map roughly like this:
|
||||
|
||||
* **Concelier** = ingest + preserve/prune raw sources
|
||||
* **Authority** = provenance (hashes, immutability, signature-like guarantees)
|
||||
* **Excititor** = transform/normalize
|
||||
* **Signals / Router / Timeline / Notify** = event pipeline + audit trail + subscriptions
|
||||
|
||||
I’ll anchor the EPSS feed details to FIRST’s docs:
|
||||
|
||||
* The data feed fields are `cve`, `epss`, `percentile` and are refreshed daily. ([FIRST][1])
|
||||
* Historical daily `.csv.gz` files exist at `https://epss.empiricalsecurity.com/epss_scores-YYYY-mm-dd.csv.gz`. ([FIRST][1])
|
||||
* The API base is `https://api.first.org/data/v1/epss` and supports per-CVE and time-series queries. ([FIRST][2])
|
||||
* FIRST notes model-version shifts (v2/v3/v4) and that the daily files include a leading `#` comment indicating model version/publish date (important for delta correctness). ([FIRST][1])
|
||||
* FIRST’s guidance: use **probability** as the primary score and **show percentile alongside it**; raw feeds provide both as decimals 0–1. ([FIRST][3])
|
||||
|
||||
---
|
||||
|
||||
## 0) Target architecture and data contracts
|
||||
|
||||
### The 3 layers and what must be true in each
|
||||
|
||||
1. **Raw layer (immutable)**
|
||||
|
||||
* You can replay exactly what you ingested, byte-for-byte.
|
||||
* Contains: file bytes or object-store pointer, headers (ETag, Last-Modified), SHA-256, parsed “header comment” (the `# …` line), ingestion status.
|
||||
|
||||
2. **Normalized probability layer (typed, queryable, historical)**
|
||||
|
||||
* One row per `(model_name, asof_date, cve_id)`.
|
||||
* Contains: `epss` probability (0–1), `percentile` (0–1), `model_version` (from file header comment if available).
|
||||
* Built for joins against vulnerability inventory and for time series.
|
||||
|
||||
3. **Signal-ready layer (risk engine contract)**
|
||||
|
||||
* Contains only actionable changes (crossing thresholds, jumps, newly-scored, etc.), ideally scoped to **observed CVEs** in your environment to avoid noise.
|
||||
* Events are idempotent, audit-friendly, and versioned.
|
||||
|
||||
---
|
||||
|
||||
## 1) Data source choice and acquisition strategy
|
||||
|
||||
### Prefer the daily bulk `.csv.gz` over paging the API for full refresh
|
||||
|
||||
* FIRST explicitly documents the “ALL CVEs for a date” bulk file URL pattern. ([FIRST][2])
|
||||
* The API is great for:
|
||||
|
||||
* “give me EPSS for this CVE list”
|
||||
* “give me last 30 days time series for CVE X” ([FIRST][2])
|
||||
|
||||
**Recommendation**
|
||||
|
||||
* Daily job pulls the bulk file for “latest available date”.
|
||||
* A separate on-demand endpoint uses the API time-series for UI convenience (optional).
|
||||
|
||||
### Robust “latest available date” probing
|
||||
|
||||
Because the “current day” file may not be published when your cron fires:
|
||||
|
||||
Algorithm:
|
||||
|
||||
1. Let `d0 = UtcToday`.
|
||||
2. For `d in [d0, d0-1, d0-2, d0-3]`:
|
||||
|
||||
* Try `GET https://epss.empiricalsecurity.com/epss_scores-{d:yyyy-MM-dd}.csv.gz`
|
||||
* If HTTP 200: ingest that as `asof_date = d` and stop.
|
||||
3. If none succeed: fail the job with a clear message + alert.
|
||||
|
||||
This avoids timezone and publishing-time ambiguity.
|
||||
|
||||
---
|
||||
|
||||
## 2) Layer 1: Raw feed (Concelier + Authority)
|
||||
|
||||
### 2.1 Schema for raw + lineage
|
||||
|
||||
Use a dedicated schema `epss` so the pipeline is easy to reason about.
|
||||
|
||||
```sql
|
||||
create schema if not exists epss;
|
||||
|
||||
-- Immutable file-level record
|
||||
create table if not exists epss.raw_file (
|
||||
raw_id bigserial primary key,
|
||||
source_uri text not null,
|
||||
asof_date date not null,
|
||||
fetched_at timestamptz not null default now(),
|
||||
|
||||
http_etag text,
|
||||
http_last_modified timestamptz,
|
||||
content_len bigint,
|
||||
|
||||
content_sha256 bytea not null,
|
||||
|
||||
-- first non-empty comment lines like "# model=... date=..."
|
||||
header_comment text,
|
||||
model_version text,
|
||||
model_published_on date,
|
||||
|
||||
-- storage: either inline bytea OR object storage pointer
|
||||
storage_kind text not null default 'pg_bytea', -- 'pg_bytea' | 's3' | 'fs'
|
||||
storage_ref text,
|
||||
content_gz bytea, -- nullable if stored externally
|
||||
|
||||
parse_status text not null default 'pending', -- pending|parsed|failed
|
||||
parse_error text,
|
||||
|
||||
unique (source_uri, asof_date, content_sha256)
|
||||
);
|
||||
|
||||
create index if not exists ix_epss_raw_file_asof on epss.raw_file(asof_date);
|
||||
create index if not exists ix_epss_raw_file_status on epss.raw_file(parse_status);
|
||||
```
|
||||
|
||||
**Why store `model_version` here?**
|
||||
FIRST warns that model updates cause “major shifts” and the daily files include a `#` comment with model version/publish date. If you ignore this, your delta logic will misfire on model-change days. ([FIRST][1])
|
||||
|
||||
### 2.2 Raw ingestion idempotency rules
|
||||
|
||||
A run is “already ingested” if:
|
||||
|
||||
* a row exists for `(source_uri, asof_date)` with the same `content_sha256`, OR
|
||||
* you implement “single truth per day” and treat any new sha for the same date as “replace” (rare, but can happen).
|
||||
|
||||
Recommended:
|
||||
|
||||
* **Treat as replace only if** you’re confident the source can republish the same date. If not, keep both but mark the superseded one.
|
||||
|
||||
### 2.3 Raw ingestion implementation details (.NET)
|
||||
|
||||
**Key constraints**
|
||||
|
||||
* Download as a stream (`ResponseHeadersRead`)
|
||||
* Compute SHA-256 while streaming
|
||||
* Store bytes or stream them into object storage
|
||||
* Capture ETag/Last-Modified headers if present
|
||||
|
||||
Pseudo-implementation structure:
|
||||
|
||||
* `EpssFetchJob`
|
||||
|
||||
* `ProbeLatestDateAsync()`
|
||||
* `DownloadAsync(uri)`
|
||||
* `ExtractHeaderCommentAsync(gzipStream)` (read a few first lines after decompression)
|
||||
* `InsertRawFileRecord(...)` (Concelier + Authority)
|
||||
|
||||
**Header comment extraction**
|
||||
FIRST indicates files may start with `# ... model version ... publish date ...`. ([FIRST][1])
|
||||
So do:
|
||||
|
||||
* Decompress
|
||||
* Read lines until you find first non-empty non-`#` line (that’s likely CSV header / first row)
|
||||
* Save the concatenated `#` lines as `header_comment`
|
||||
* Regex best-effort parse:
|
||||
|
||||
* `model_version`: something like `v2025.03.14`
|
||||
* `model_published_on`: `YYYY-MM-DD`
|
||||
|
||||
If parsing fails, still store `header_comment`.
|
||||
|
||||
### 2.4 Pruning raw (Concelier “preserve-prune”)
|
||||
|
||||
Define retention policy:
|
||||
|
||||
* Keep **raw bytes** 90–180 days (cheap enough; each `.csv.gz` is usually a few–tens of MB)
|
||||
* Keep **metadata** forever (tiny, essential for audits)
|
||||
|
||||
Nightly cleanup job:
|
||||
|
||||
* delete `content_gz` or external object for `raw_file` older than retention
|
||||
* keep row but set `storage_kind='pruned'`, `content_gz=null`, `storage_ref=null`
|
||||
|
||||
---
|
||||
|
||||
## 3) Layer 2: Normalized probability tables (Excititor)
|
||||
|
||||
### 3.1 Core normalized table design
|
||||
|
||||
Requirements:
|
||||
|
||||
* Efficient time series per CVE
|
||||
* Efficient “latest score per CVE”
|
||||
* Efficient join to “observed vulnerabilities” tables
|
||||
|
||||
#### Daily score table (partitioned)
|
||||
|
||||
```sql
|
||||
create table if not exists epss.daily_score (
|
||||
model_name text not null, -- 'FIRST_EPSS'
|
||||
asof_date date not null,
|
||||
cve_id text not null,
|
||||
epss double precision not null,
|
||||
percentile double precision,
|
||||
model_version text, -- from raw header if available
|
||||
raw_id bigint references epss.raw_file(raw_id),
|
||||
loaded_at timestamptz not null default now(),
|
||||
|
||||
-- Guards
|
||||
constraint ck_epss_range check (epss >= 0.0 and epss <= 1.0),
|
||||
constraint ck_percentile_range check (percentile is null or (percentile >= 0.0 and percentile <= 1.0)),
|
||||
|
||||
primary key (model_name, asof_date, cve_id)
|
||||
) partition by range (asof_date);
|
||||
|
||||
-- Example monthly partitions (create via migration script generator)
|
||||
create table if not exists epss.daily_score_2025_12
|
||||
partition of epss.daily_score for values from ('2025-12-01') to ('2026-01-01');
|
||||
|
||||
create index if not exists ix_epss_daily_score_cve on epss.daily_score (model_name, cve_id, asof_date desc);
|
||||
create index if not exists ix_epss_daily_score_epss on epss.daily_score (model_name, asof_date, epss desc);
|
||||
create index if not exists ix_epss_daily_score_pct on epss.daily_score (model_name, asof_date, percentile desc);
|
||||
```
|
||||
|
||||
**Field semantics**
|
||||
|
||||
* `epss` is the probability of exploitation in the next 30 days, 0–1. ([FIRST][1])
|
||||
* `percentile` is relative rank among all scored vulnerabilities. ([FIRST][1])
|
||||
|
||||
### 3.2 Maintain a “latest” table for fast joins
|
||||
|
||||
Don’t compute latest via window functions in hot paths (policy evaluation / scoring). Materialize it.
|
||||
|
||||
```sql
|
||||
create table if not exists epss.latest_score (
|
||||
model_name text not null,
|
||||
cve_id text not null,
|
||||
asof_date date not null,
|
||||
epss double precision not null,
|
||||
percentile double precision,
|
||||
model_version text,
|
||||
updated_at timestamptz not null default now(),
|
||||
primary key (model_name, cve_id)
|
||||
);
|
||||
|
||||
create index if not exists ix_epss_latest_epss on epss.latest_score(model_name, epss desc);
|
||||
create index if not exists ix_epss_latest_pct on epss.latest_score(model_name, percentile desc);
|
||||
```
|
||||
|
||||
Update logic (after loading a day):
|
||||
|
||||
* Upsert each CVE (or do a set-based upsert):
|
||||
|
||||
* `asof_date` should only move forward
|
||||
* if a backfill loads an older day, do not overwrite latest
|
||||
|
||||
### 3.3 Delta table for change detection
|
||||
|
||||
Store deltas per day (this powers signals and “sparkline deltas”).
|
||||
|
||||
```sql
|
||||
create table if not exists epss.daily_delta (
|
||||
model_name text not null,
|
||||
asof_date date not null,
|
||||
cve_id text not null,
|
||||
|
||||
epss double precision not null,
|
||||
prev_asof_date date,
|
||||
prev_epss double precision,
|
||||
epss_delta double precision,
|
||||
|
||||
percentile double precision,
|
||||
prev_percentile double precision,
|
||||
percentile_delta double precision,
|
||||
|
||||
model_version text,
|
||||
prev_model_version text,
|
||||
is_model_change boolean not null default false,
|
||||
|
||||
created_at timestamptz not null default now(),
|
||||
primary key (model_name, asof_date, cve_id)
|
||||
);
|
||||
|
||||
create index if not exists ix_epss_daily_delta_cve on epss.daily_delta(model_name, cve_id, asof_date desc);
|
||||
create index if not exists ix_epss_daily_delta_delta on epss.daily_delta(model_name, asof_date, epss_delta desc);
|
||||
```
|
||||
|
||||
**Model update handling**
|
||||
|
||||
* On a model version change day (v3→v4 etc), many deltas will jump.
|
||||
* FIRST explicitly warns model shifts. ([FIRST][1])
|
||||
So:
|
||||
* detect if today’s `model_version != previous_day.model_version`
|
||||
* set `is_model_change = true`
|
||||
* optionally **suppress delta-based signals** that day (or emit a separate “MODEL_UPDATED” event)
|
||||
|
||||
### 3.4 Normalization job mechanics
|
||||
|
||||
Implement `EpssNormalizeJob`:
|
||||
|
||||
1. Select `raw_file` rows where `parse_status='pending'`.
|
||||
2. Decompress `content_gz` or fetch from object store.
|
||||
3. Parse CSV:
|
||||
|
||||
* skip `#` comment lines
|
||||
* expect columns: `cve,epss,percentile` (FIRST documents these fields). ([FIRST][1])
|
||||
4. Validate:
|
||||
|
||||
* CVE format: `^CVE-\d{4}-\d{4,}$`
|
||||
* numeric parse for epss/percentile
|
||||
* range checks 0–1
|
||||
5. Load into Postgres fast:
|
||||
|
||||
* Use `COPY` (binary import) into a **staging table** `epss.stage_score`
|
||||
* Then set-based insert into `epss.daily_score`
|
||||
6. Update `epss.raw_file.parse_status='parsed'` or `failed`.
|
||||
|
||||
#### Staging table pattern
|
||||
|
||||
```sql
|
||||
create unlogged table if not exists epss.stage_score (
|
||||
model_name text not null,
|
||||
asof_date date not null,
|
||||
cve_id text not null,
|
||||
epss double precision not null,
|
||||
percentile double precision,
|
||||
model_version text,
|
||||
raw_id bigint not null
|
||||
);
|
||||
```
|
||||
|
||||
In the job:
|
||||
|
||||
* `truncate epss.stage_score;`
|
||||
* `COPY epss.stage_score FROM STDIN (FORMAT BINARY)`
|
||||
* Then (transactionally):
|
||||
|
||||
* `delete from epss.daily_score where model_name=@m and asof_date=@d;` *(idempotency for reruns)*
|
||||
* `insert into epss.daily_score (...) select ... from epss.stage_score;`
|
||||
|
||||
This avoids `ON CONFLICT` overhead and guarantees deterministic reruns.
|
||||
|
||||
### 3.5 Delta + latest materialization job
|
||||
|
||||
Implement `EpssMaterializeJob` after successful daily_score insert.
|
||||
|
||||
**Compute previous available date**
|
||||
|
||||
```sql
|
||||
-- previous date available for that model_name
|
||||
select max(asof_date)
|
||||
from epss.daily_score
|
||||
where model_name = @model
|
||||
and asof_date < @asof_date;
|
||||
```
|
||||
|
||||
**Populate delta (set-based)**
|
||||
|
||||
```sql
|
||||
insert into epss.daily_delta (
|
||||
model_name, asof_date, cve_id,
|
||||
epss, prev_asof_date, prev_epss, epss_delta,
|
||||
percentile, prev_percentile, percentile_delta,
|
||||
model_version, prev_model_version, is_model_change
|
||||
)
|
||||
select
|
||||
cur.model_name,
|
||||
cur.asof_date,
|
||||
cur.cve_id,
|
||||
cur.epss,
|
||||
prev.asof_date as prev_asof_date,
|
||||
prev.epss as prev_epss,
|
||||
cur.epss - prev.epss as epss_delta,
|
||||
cur.percentile,
|
||||
prev.percentile as prev_percentile,
|
||||
(cur.percentile - prev.percentile) as percentile_delta,
|
||||
cur.model_version,
|
||||
prev.model_version,
|
||||
(cur.model_version is not null and prev.model_version is not null and cur.model_version <> prev.model_version) as is_model_change
|
||||
from epss.daily_score cur
|
||||
left join epss.daily_score prev
|
||||
on prev.model_name = cur.model_name
|
||||
and prev.asof_date = @prev_asof_date
|
||||
and prev.cve_id = cur.cve_id
|
||||
where cur.model_name = @model
|
||||
and cur.asof_date = @asof_date;
|
||||
```
|
||||
|
||||
**Update latest_score (set-based upsert)**
|
||||
|
||||
```sql
|
||||
insert into epss.latest_score(model_name, cve_id, asof_date, epss, percentile, model_version)
|
||||
select model_name, cve_id, asof_date, epss, percentile, model_version
|
||||
from epss.daily_score
|
||||
where model_name=@model and asof_date=@asof_date
|
||||
on conflict (model_name, cve_id) do update
|
||||
set asof_date = excluded.asof_date,
|
||||
epss = excluded.epss,
|
||||
percentile = excluded.percentile,
|
||||
model_version = excluded.model_version,
|
||||
updated_at = now()
|
||||
where epss.latest_score.asof_date < excluded.asof_date;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4) Layer 3: Signal-ready output (Signals + Router + Timeline + Notify)
|
||||
|
||||
### 4.1 Decide what “signal” means in Stella Ops
|
||||
|
||||
You do **not** want to emit 300k events daily.
|
||||
|
||||
You want “actionable” events, ideally:
|
||||
|
||||
* only for CVEs that are **observed** in your tenant’s environment, and
|
||||
* only when something meaningful happens.
|
||||
|
||||
Examples:
|
||||
|
||||
* Risk band changes (based on percentile or probability)
|
||||
* ΔEPS S crosses a threshold (e.g., jump ≥ 0.05)
|
||||
* Newly scored CVEs that are present in environment
|
||||
* Model version change day → one summary event instead of 300k deltas
|
||||
|
||||
### 4.2 Risk band mapping (internal heuristic)
|
||||
|
||||
FIRST explicitly does **not** “officially bin” EPSS scores; binning is subjective. ([FIRST][3])
|
||||
But operationally you’ll want bands. Use config-driven thresholds.
|
||||
|
||||
Default band function based on percentile:
|
||||
|
||||
* `CRITICAL` if `percentile >= 0.995`
|
||||
* `HIGH` if `percentile >= 0.99`
|
||||
* `MEDIUM` if `percentile >= 0.90`
|
||||
* else `LOW`
|
||||
|
||||
Store these in config per tenant/policy pack.
|
||||
|
||||
### 4.3 Signal table for idempotency + audit
|
||||
|
||||
```sql
|
||||
create table if not exists epss.signal (
|
||||
signal_id bigserial primary key,
|
||||
tenant_id uuid not null,
|
||||
model_name text not null,
|
||||
asof_date date not null,
|
||||
cve_id text not null,
|
||||
|
||||
event_type text not null, -- 'RISK_BAND_UP' | 'RISK_SPIKE' | 'MODEL_UPDATED' | ...
|
||||
risk_band text,
|
||||
epss double precision,
|
||||
epss_delta double precision,
|
||||
percentile double precision,
|
||||
percentile_delta double precision,
|
||||
|
||||
is_model_change boolean not null default false,
|
||||
|
||||
-- deterministic idempotency key
|
||||
dedupe_key text not null,
|
||||
payload jsonb not null,
|
||||
|
||||
created_at timestamptz not null default now(),
|
||||
|
||||
unique (tenant_id, dedupe_key)
|
||||
);
|
||||
|
||||
create index if not exists ix_epss_signal_tenant_date on epss.signal(tenant_id, asof_date desc);
|
||||
create index if not exists ix_epss_signal_cve on epss.signal(tenant_id, cve_id, asof_date desc);
|
||||
```
|
||||
|
||||
**Dedupe key pattern**
|
||||
Make it deterministic:
|
||||
|
||||
```
|
||||
dedupe_key = $"{model_name}:{asof_date:yyyy-MM-dd}:{cve_id}:{event_type}:{band_before}->{band_after}"
|
||||
```
|
||||
|
||||
### 4.4 Signal generation job
|
||||
|
||||
Implement `EpssSignalJob(tenant)`:
|
||||
|
||||
1. Get tenant’s **observed CVEs** from your vuln inventory (whatever your table is; call it `vuln.instance`):
|
||||
|
||||
* only open/unremediated vulns
|
||||
* optionally only “reachable” or “internet exposed” assets
|
||||
|
||||
2. Join against today’s `epss.daily_delta` (or `epss.daily_score` if you skipped delta):
|
||||
|
||||
Pseudo-SQL:
|
||||
|
||||
```sql
|
||||
select d.*
|
||||
from epss.daily_delta d
|
||||
join vuln.observed_cve oc
|
||||
on oc.tenant_id = @tenant
|
||||
and oc.cve_id = d.cve_id
|
||||
where d.model_name=@model
|
||||
and d.asof_date=@asof_date;
|
||||
```
|
||||
|
||||
3. Suppress noise:
|
||||
|
||||
* if `is_model_change=true`, skip “delta spike” events and instead emit one `MODEL_UPDATED` summary event per tenant (and maybe per policy domain).
|
||||
* else evaluate:
|
||||
|
||||
* `abs(epss_delta) >= delta_threshold`
|
||||
* band change
|
||||
* percentile crosses a cutoff
|
||||
|
||||
4. Insert into `epss.signal` with dedupe key, then publish to Signals bus:
|
||||
|
||||
* topic: `signals.epss`
|
||||
* payload includes `tenant_id`, `cve_id`, `asof_date`, `epss`, `percentile`, deltas, band, and an `evidence` block.
|
||||
|
||||
5. Timeline + Notify:
|
||||
|
||||
* Timeline: record the event (what changed, when, data source sha)
|
||||
* Notify: notify subscribed channels (Slack/email/etc) based on tenant policy
|
||||
|
||||
### 4.5 Evidence payload structure
|
||||
|
||||
Keep evidence deterministic + replayable:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": {
|
||||
"provider": "FIRST",
|
||||
"feed": "epss_scores-YYYY-MM-DD.csv.gz",
|
||||
"asof_date": "2025-12-17",
|
||||
"raw_sha256": "…",
|
||||
"model_version": "v2025.03.14",
|
||||
"header_comment": "# ... "
|
||||
},
|
||||
"metrics": {
|
||||
"epss": 0.153,
|
||||
"percentile": 0.92,
|
||||
"epss_delta": 0.051,
|
||||
"percentile_delta": 0.03
|
||||
},
|
||||
"decision": {
|
||||
"event_type": "RISK_SPIKE",
|
||||
"thresholds": {
|
||||
"delta_threshold": 0.05,
|
||||
"critical_percentile": 0.995
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This aligns with FIRST’s recommendation to present probability with percentile when possible. ([FIRST][3])
|
||||
|
||||
---
|
||||
|
||||
## 5) Integration points inside Stella Ops
|
||||
|
||||
### 5.1 Policy Engine usage
|
||||
|
||||
Policy Engine should **only** read from Layer 2 (normalized) and Layer 3 (signals), never raw.
|
||||
|
||||
Patterns:
|
||||
|
||||
* For gating decisions: query `epss.latest_score` for each CVE in a build/image/SBOM scan result.
|
||||
* For “why was this blocked?”: show evidence that references `raw_sha256` and `model_version`.
|
||||
|
||||
### 5.2 Vuln scoring pipeline
|
||||
|
||||
When you compute “Stella Risk Score” for a vuln instance:
|
||||
|
||||
* Join `vuln_instance.cve_id` → `epss.latest_score`
|
||||
* Combine with CVSS, KEV, exploit maturity, asset exposure, etc.
|
||||
* EPSS alone is **threat likelihood**, not impact; FIRST explicitly says it’s not a complete picture of risk. ([FIRST][4])
|
||||
|
||||
### 5.3 UI display
|
||||
|
||||
Recommended UI string (per FIRST guidance):
|
||||
|
||||
* Show **probability** as a percent + show percentile:
|
||||
|
||||
* `15.3% (92nd percentile)` ([FIRST][3])
|
||||
|
||||
For sparklines:
|
||||
|
||||
* Use `epss.daily_score` time series for last N days
|
||||
* Annotate model-version change days (vertical marker)
|
||||
|
||||
---
|
||||
|
||||
## 6) Operational hardening
|
||||
|
||||
### 6.1 Scheduling
|
||||
|
||||
* Run daily at a fixed time in UTC.
|
||||
* Probe up to 3 back days for latest file.
|
||||
|
||||
### 6.2 Exactly-once semantics
|
||||
|
||||
Use three safeguards:
|
||||
|
||||
1. `epss.raw_file` uniqueness on `(source_uri, asof_date, sha256)`
|
||||
2. Transactional load:
|
||||
|
||||
* delete existing `daily_score` for that `(model_name, asof_date)`
|
||||
* insert freshly parsed rows
|
||||
3. Advisory lock per `(model_name, asof_date)` to prevent concurrent loads:
|
||||
|
||||
* `pg_advisory_xact_lock(hashtext(model_name), asof_date::int)`
|
||||
|
||||
### 6.3 Monitoring (must-have metrics)
|
||||
|
||||
Emit metrics per job stage:
|
||||
|
||||
* download success/failure
|
||||
* bytes downloaded
|
||||
* sha256 computed
|
||||
* rows parsed
|
||||
* parse error count
|
||||
* rows inserted into `daily_score`
|
||||
* delta rows created
|
||||
* signal events emitted
|
||||
* “model version changed” boolean
|
||||
|
||||
Alert conditions:
|
||||
|
||||
* no new asof_date ingested for > 48 hours
|
||||
* parse failure
|
||||
* row count drops by > X% from previous day (data anomaly)
|
||||
|
||||
### 6.4 Backfills
|
||||
|
||||
Implement `epss backfill --from 2021-04-14 --to 2025-12-17`:
|
||||
|
||||
* Fetch raw files for each day
|
||||
* Normalize daily_score
|
||||
* Materialize latest and delta
|
||||
* **Disable signals** during bulk backfill (or route to “silent” topic) to avoid spamming.
|
||||
|
||||
FIRST notes historical data begins 2021-04-14. ([FIRST][1])
|
||||
|
||||
---
|
||||
|
||||
## 7) Reference .NET job skeletons
|
||||
|
||||
### Job boundaries
|
||||
|
||||
* `EpssFetchJob` → writes `epss.raw_file`
|
||||
* `EpssNormalizeJob` → fills `epss.daily_score`
|
||||
* `EpssMaterializeJob` → updates `epss.daily_delta` and `epss.latest_score`
|
||||
* `EpssSignalJob` → per-tenant emission into `epss.signal` + bus publish
|
||||
|
||||
### Performance notes
|
||||
|
||||
* Use `GZipStream` + `StreamReader` line-by-line (no full file into memory)
|
||||
* Use `NpgsqlBinaryImporter` for `COPY` into staging
|
||||
* Use set-based SQL for delta/latest
|
||||
|
||||
---
|
||||
|
||||
## 8) The “gotchas” that make or break EPSS pipelines
|
||||
|
||||
1. **Model version changes create false deltas**
|
||||
Store `model_version` and mark `is_model_change`. FIRST explicitly warns about score shifts on model updates and notes v4 began publishing on 2025‑03‑17. ([FIRST][1])
|
||||
|
||||
2. **Percentile is relative; probability is primary**
|
||||
Probability should remain your canonical numeric score; percentile provides context for humans. ([FIRST][3])
|
||||
|
||||
3. **Don’t emit global events**
|
||||
Restrict signals to observed CVEs per tenant/environment.
|
||||
|
||||
4. **Keep raw provenance**
|
||||
Your audit story depends on storing:
|
||||
|
||||
* exact source URI, as-of date, sha256, header comment
|
||||
|
||||
---
|
||||
|
||||
## 9) Minimal “definition of done” checklist
|
||||
|
||||
### Data correctness
|
||||
|
||||
* [ ] For a known CVE, `epss.latest_score` matches the daily file for the latest asof_date
|
||||
* [ ] `epss` and `percentile` ranges enforced (0–1)
|
||||
* [ ] Model version extracted when present; otherwise stored as null but header_comment preserved
|
||||
* [ ] Delta rows created and `is_model_change` flips on version changes
|
||||
|
||||
### Operational
|
||||
|
||||
* [ ] Daily job retries on transient HTTP failures
|
||||
* [ ] Alert if no new asof_date in 48h
|
||||
* [ ] Raw bytes retention + metadata retention
|
||||
|
||||
### Product
|
||||
|
||||
* [ ] UI displays `probability% (percentile)` per FIRST recommendation ([FIRST][3])
|
||||
* [ ] Signal events link to evidence (raw sha, model version, asof date)
|
||||
* [ ] Policy Engine consumes `latest_score` only (never raw)
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also provide:
|
||||
|
||||
* a **migration script generator** that auto-creates monthly partitions for `epss.daily_score`,
|
||||
* an example **Valkey caching strategy** (`epss:latest:{cve}` with a 48h TTL, warmed only for observed CVEs),
|
||||
* and a concrete **“observed CVE” join contract** (what columns to expose from your vuln inventory so EPSS signals stay noise-free).
|
||||
|
||||
[1]: https://www.first.org/epss/data_stats "Exploit Prediction Scoring System (EPSS)"
|
||||
[2]: https://www.first.org/epss/api "Exploit Prediction Scoring System (EPSS)"
|
||||
[3]: https://www.first.org/epss/articles/prob_percentile_bins "Exploit Prediction Scoring System (EPSS)"
|
||||
[4]: https://www.first.org/epss/faq "EPSS Frequently Asked Questions"
|
||||
@@ -0,0 +1,444 @@
|
||||
# ARCHIVED ADVISORY
|
||||
|
||||
> **Status:** Archived
|
||||
> **Archived Date:** 2025-12-18
|
||||
> **Implementation Sprints:**
|
||||
> - `SPRINT_3700_0001_0001_witness_foundation.md` - BLAKE3 + Witness Schema
|
||||
> - `SPRINT_3700_0002_0001_vuln_surfaces_core.md` - Vuln Surface Builder
|
||||
> - `SPRINT_3700_0003_0001_trigger_extraction.md` - Trigger Method Extraction
|
||||
> - `SPRINT_3700_0004_0001_reachability_integration.md` - Reachability Integration
|
||||
> - `SPRINT_3700_0005_0001_witness_ui_cli.md` - Witness UI/CLI
|
||||
> - `SPRINT_3700_0006_0001_incremental_cache.md` - Incremental Cache
|
||||
>
|
||||
> **Gap Analysis:** See `C:\Users\vlindos\.claude\plans\lexical-knitting-map.md`
|
||||
|
||||
---
|
||||
|
||||
Here's a compact, practical way to add two high-leverage capabilities to your scanner: **DSSE-signed path witnesses** and **Smart-Diff x Reachability**-what they are, why they matter, and exactly how to implement them in Stella Ops without ceremony.
|
||||
|
||||
---
|
||||
|
||||
# 1) DSSE-signed path witnesses (entrypoint -> calls -> sink)
|
||||
|
||||
**What it is (in plain terms):**
|
||||
When you flag a CVE as "reachable," also emit a tiny, human-readable proof: the **exact path** from a real entrypoint (e.g., HTTP route, CLI verb, cron) through functions/methods to the **vulnerable sink**. Wrap that proof in a **DSSE** envelope and sign it. Anyone can verify the witness later-offline-without rerunning analysis.
|
||||
|
||||
**Why it matters:**
|
||||
|
||||
* Turns red flags into **auditable evidence** (quiet-by-design).
|
||||
* Lets CI/CD, auditors, and customers **verify** findings independently.
|
||||
* Enables **deterministic replay** and provenance chains (ties nicely to in-toto/SLSA).
|
||||
|
||||
**Minimal JSON witness (stable, vendor-neutral):**
|
||||
|
||||
```json
|
||||
{
|
||||
"witness_schema": "stellaops.witness.v1",
|
||||
"artifact": { "sbom_digest": "sha256:...", "component_purl": "pkg:nuget/Example@1.2.3" },
|
||||
"vuln": { "id": "CVE-2024-XXXX", "source": "NVD", "range": "<=1.2.3" },
|
||||
"entrypoint": { "kind": "http", "name": "GET /billing/pay" },
|
||||
"path": [
|
||||
{"symbol": "BillingController.Pay()", "file": "BillingController.cs", "line": 42},
|
||||
{"symbol": "PaymentsService.Authorize()", "file": "PaymentsService.cs", "line": 88},
|
||||
{"symbol": "LibXYZ.Parser.Parse()", "file": "Parser.cs", "line": 17}
|
||||
],
|
||||
"sink": { "symbol": "LibXYZ.Parser.Parse()", "type": "deserialization" },
|
||||
"evidence": {
|
||||
"callgraph_digest": "sha256:...",
|
||||
"build_id": "dotnet:RID:linux-x64:sha256:...",
|
||||
"analysis_config_digest": "sha256:..."
|
||||
},
|
||||
"observed_at": "2025-12-18T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Wrap in DSSE (payloadType & payload are required)**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.witness+json",
|
||||
"payload": "base64(JSON_above)",
|
||||
"signatures": [{ "keyid": "attestor-stellaops-ed25519", "sig": "base64(...)" }]
|
||||
}
|
||||
```
|
||||
|
||||
**.NET 10 signing/verifying (Ed25519)**
|
||||
|
||||
```csharp
|
||||
using System.Security.Cryptography;
|
||||
using System.Text.Json;
|
||||
|
||||
var payloadBytes = JsonSerializer.SerializeToUtf8Bytes(witnessJsonObj);
|
||||
var dsse = new {
|
||||
payloadType = "application/vnd.stellaops.witness+json",
|
||||
payload = Convert.ToBase64String(payloadBytes),
|
||||
signatures = new [] { new { keyid = keyId, sig = Convert.ToBase64String(Sign(payloadBytes, privateKey)) } }
|
||||
};
|
||||
byte[] Sign(byte[] data, byte[] privateKey)
|
||||
{
|
||||
using var ed = new Ed25519();
|
||||
// import private key, sign data (left as your Ed25519 helper)
|
||||
return ed.SignData(data, privateKey);
|
||||
}
|
||||
```
|
||||
|
||||
**Where to emit:**
|
||||
|
||||
* **Scanner.Worker**: after reachability confirms `reachable=true`, emit witness -> **Attestor** signs -> **Authority** stores (Postgres) -> optional Rekor-style mirror.
|
||||
* Expose `/witness/{findingId}` for download & independent verification.
|
||||
|
||||
---
|
||||
|
||||
# 2) Smart-Diff x Reachability (incremental, low-noise updates)
|
||||
|
||||
**What it is:**
|
||||
On **SBOM/VEX/dependency** deltas, don't rescan everything. Update only **affected regions** of the call graph and recompute reachability **just for changed nodes/edges**.
|
||||
|
||||
**Why it matters:**
|
||||
|
||||
* **Order-of-magnitude faster** incremental scans.
|
||||
* Fewer flaky diffs; triage stays focused on **meaningful risk change**.
|
||||
* Perfect for PR gating: "what changed" -> "what became reachable/unreachable."
|
||||
|
||||
**Core idea (graph-reachability):**
|
||||
|
||||
* Maintain a per-service **call graph** `G = (V, E)` with **entrypoint set** `S`.
|
||||
* On diff: compute changed nodes/edges DV/DE.
|
||||
* Run **incremental BFS/DFS** from impacted nodes to sinks (forward or backward), reusing memoized results.
|
||||
* Recompute only **frontiers** touched by D.
|
||||
|
||||
**Minimal tables (Postgres):**
|
||||
|
||||
```sql
|
||||
-- Nodes (functions/methods)
|
||||
CREATE TABLE cg_nodes(
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
service TEXT, symbol TEXT, file TEXT, line INT,
|
||||
hash TEXT, UNIQUE(service, hash)
|
||||
);
|
||||
-- Edges (calls)
|
||||
CREATE TABLE cg_edges(
|
||||
src BIGINT REFERENCES cg_nodes(id),
|
||||
dst BIGINT REFERENCES cg_nodes(id),
|
||||
kind TEXT, PRIMARY KEY(src, dst)
|
||||
);
|
||||
-- Entrypoints & Sinks
|
||||
CREATE TABLE cg_entrypoints(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY);
|
||||
CREATE TABLE cg_sinks(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY, sink_type TEXT);
|
||||
|
||||
-- Memoized reachability cache
|
||||
CREATE TABLE cg_reach_cache(
|
||||
entry_id BIGINT, sink_id BIGINT,
|
||||
path JSONB, reachable BOOLEAN,
|
||||
updated_at TIMESTAMPTZ,
|
||||
PRIMARY KEY(entry_id, sink_id)
|
||||
);
|
||||
```
|
||||
|
||||
**Incremental algorithm (pseudocode):**
|
||||
|
||||
```text
|
||||
Input: DSBOM, DDeps, DCode -> DNodes, DEdges
|
||||
1) Apply D to cg_nodes/cg_edges
|
||||
2) ImpactSet = neighbors(DNodes U endpoints(DEdges))
|
||||
3) For each e in Entrypoints intersect ancestors(ImpactSet):
|
||||
Recompute forward search to affected sinks, stop early on unchanged subgraphs
|
||||
Update cg_reach_cache; if state flips, emit new/updated DSSE witness
|
||||
```
|
||||
|
||||
**.NET 10 reachability sketch (fast & local):**
|
||||
|
||||
```csharp
|
||||
HashSet<int> ImpactSet = ComputeImpact(deltaNodes, deltaEdges);
|
||||
foreach (var e in Intersect(Entrypoints, Ancestors(ImpactSet)))
|
||||
{
|
||||
var res = BoundedReach(e, affectedSinks, graph, cache);
|
||||
foreach (var r in res.Changed)
|
||||
{
|
||||
cache.Upsert(e, r.Sink, r.Path, r.Reachable);
|
||||
if (r.Reachable) EmitDsseWitness(e, r.Sink, r.Path);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CI/PR flow:**
|
||||
|
||||
1. Build -> SBOM diff -> Dependency diff -> Call-graph delta.
|
||||
2. Run incremental reachability.
|
||||
3. If any `unreachable->reachable` transitions: **fail gate**, attach DSSE witnesses.
|
||||
4. If `reachable->unreachable`: auto-close prior findings (and archive prior witness).
|
||||
|
||||
---
|
||||
|
||||
# UX hooks (quick wins)
|
||||
|
||||
* In findings list, add a **"Show Witness"** button -> modal renders the signed path (entrypoint->...->sink) + **"Verify Signature"** one-click.
|
||||
* In PR checks, summarize only **state flips** with tiny links: "+2 reachable (view witness)" / "-1 (now unreachable)".
|
||||
|
||||
---
|
||||
|
||||
# Minimal tasks to get this live
|
||||
|
||||
* **Scanner.Worker**: build call-graph extraction (per language), add incremental graph store, reachability cache.
|
||||
* **Attestor**: DSSE signing endpoint + key management (Ed25519 by default; PQC mode later).
|
||||
* **Authority**: tables above + witness storage + retrieval API.
|
||||
* **Router/CI plugin**: PR annotation with **state flips** and links to witnesses.
|
||||
* **UI**: witness modal + signature verify.
|
||||
|
||||
If you want, I can draft the exact Postgres migrations, the C# repositories, and a tiny verifier CLI that checks DSSE signatures and prints the call path.
|
||||
Below is a concrete, buildable blueprint for an **advanced reachability analysis engine** inside Stella Ops. I'm going to assume your "Stella Ops" components are roughly:
|
||||
|
||||
* **Scanner.Worker**: runs analyses in CI / on artifacts
|
||||
* **Authority**: stores graphs/findings/witnesses
|
||||
* **Attestor**: signs DSSE envelopes (Ed25519)
|
||||
* (optional) **SurfaceBuilder**: background worker that computes "vuln surfaces" for packages
|
||||
|
||||
The key advance is: **don't treat a CVE as "a package"**. Treat it as a **set of trigger methods** (public API) that can reach the vulnerable code inside the dependency-computed by "Smart-Diff" once, reused everywhere.
|
||||
|
||||
---
|
||||
|
||||
## 0) Define the contract (precision/soundness) up front
|
||||
|
||||
If you don't write this down, you'll fight false positives/negatives forever.
|
||||
|
||||
### What Stella Ops will guarantee (first release)
|
||||
|
||||
* **Whole-program static call graph** (app + selected dependency assemblies)
|
||||
* **Context-insensitive** (fast), **path witness** extracted (shortest path)
|
||||
* **Dynamic dispatch handled** with CHA/RTA (+ DI hints), with explicit uncertainty flags
|
||||
* **Reflection handled best-effort** (constant-string resolution), otherwise "unknown edge"
|
||||
|
||||
### What it will NOT guarantee (first release)
|
||||
|
||||
* Perfect handling of reflection / `dynamic` / runtime codegen
|
||||
* Perfect delegate/event resolution across complex flows
|
||||
* Full taint/dataflow reachability (you can add later)
|
||||
|
||||
This is fine. The major value is: "**we can show you the call path**" and "**we can prove the vuln is triggered by calling these library APIs**".
|
||||
|
||||
---
|
||||
|
||||
## 1) The big idea: "Vuln surfaces" (Smart-Diff -> triggers)
|
||||
|
||||
### Problem
|
||||
|
||||
CVE feeds typically say "package X version range Y is vulnerable" but rarely say *which methods*. If you only do package-level reachability, noise is huge.
|
||||
|
||||
### Solution
|
||||
|
||||
For each CVE+package, compute a **vulnerability surface**:
|
||||
|
||||
* **Candidate sinks** = methods changed between vulnerable and fixed versions (diff at IL level)
|
||||
* **Trigger methods** = *public/exported* methods in the vulnerable version that can reach those changed methods internally
|
||||
|
||||
Then your service scan becomes:
|
||||
|
||||
> "Can any entrypoint reach any trigger method?"
|
||||
|
||||
This is both faster and more precise.
|
||||
|
||||
---
|
||||
|
||||
## 2) Data model (Authority / Postgres)
|
||||
|
||||
You already had call graph tables; here's a concrete schema that supports:
|
||||
|
||||
* graph snapshots
|
||||
* incremental updates
|
||||
* vuln surfaces
|
||||
* reachability cache
|
||||
* DSSE witnesses
|
||||
|
||||
### 2.1 Graph tables
|
||||
|
||||
```sql
|
||||
CREATE TABLE cg_snapshots (
|
||||
snapshot_id BIGSERIAL PRIMARY KEY,
|
||||
service TEXT NOT NULL,
|
||||
build_id TEXT NOT NULL,
|
||||
graph_digest TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(service, build_id)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_nodes (
|
||||
node_id BIGSERIAL PRIMARY KEY,
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
method_key TEXT NOT NULL, -- stable key (see below)
|
||||
asm_name TEXT,
|
||||
type_name TEXT,
|
||||
method_name TEXT,
|
||||
file_path TEXT,
|
||||
line_start INT,
|
||||
il_hash TEXT, -- normalized IL hash for diffing
|
||||
flags INT NOT NULL DEFAULT 0, -- bitflags: has_reflection, compiler_generated, etc.
|
||||
UNIQUE(snapshot_id, method_key)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_edges (
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
src_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
dst_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
kind SMALLINT NOT NULL, -- 0=call,1=newobj,2=dispatch,3=delegate,4=reflection_guess,...
|
||||
PRIMARY KEY(snapshot_id, src_node_id, dst_node_id, kind)
|
||||
);
|
||||
|
||||
CREATE TABLE cg_entrypoints (
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
|
||||
kind TEXT NOT NULL, -- http, grpc, cli, job, etc.
|
||||
name TEXT NOT NULL, -- GET /foo, "Main", etc.
|
||||
PRIMARY KEY(snapshot_id, node_id, kind, name)
|
||||
);
|
||||
```
|
||||
|
||||
### 2.2 Vuln surface tables (Smart-Diff artifacts)
|
||||
|
||||
```sql
|
||||
CREATE TABLE vuln_surfaces (
|
||||
surface_id BIGSERIAL PRIMARY KEY,
|
||||
ecosystem TEXT NOT NULL, -- nuget
|
||||
package TEXT NOT NULL,
|
||||
cve_id TEXT NOT NULL,
|
||||
vuln_version TEXT NOT NULL, -- a representative vulnerable version
|
||||
fixed_version TEXT NOT NULL,
|
||||
surface_digest TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(ecosystem, package, cve_id, vuln_version, fixed_version)
|
||||
);
|
||||
|
||||
CREATE TABLE vuln_surface_sinks (
|
||||
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
|
||||
sink_method_key TEXT NOT NULL,
|
||||
reason TEXT NOT NULL, -- changed|added|removed|heuristic
|
||||
PRIMARY KEY(surface_id, sink_method_key)
|
||||
);
|
||||
|
||||
CREATE TABLE vuln_surface_triggers (
|
||||
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
|
||||
trigger_method_key TEXT NOT NULL,
|
||||
sink_method_key TEXT NOT NULL,
|
||||
internal_path JSONB, -- optional: library internal witness path
|
||||
PRIMARY KEY(surface_id, trigger_method_key, sink_method_key)
|
||||
);
|
||||
```
|
||||
|
||||
### 2.3 Reachability cache & witnesses
|
||||
|
||||
```sql
|
||||
CREATE TABLE reach_findings (
|
||||
finding_id BIGSERIAL PRIMARY KEY,
|
||||
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
|
||||
cve_id TEXT NOT NULL,
|
||||
ecosystem TEXT NOT NULL,
|
||||
package TEXT NOT NULL,
|
||||
package_version TEXT NOT NULL,
|
||||
reachable BOOLEAN NOT NULL,
|
||||
reachable_entrypoints INT NOT NULL DEFAULT 0,
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(snapshot_id, cve_id, package, package_version)
|
||||
);
|
||||
|
||||
CREATE TABLE reach_witnesses (
|
||||
witness_id BIGSERIAL PRIMARY KEY,
|
||||
finding_id BIGINT REFERENCES reach_findings(finding_id) ON DELETE CASCADE,
|
||||
entry_node_id BIGINT REFERENCES cg_nodes(node_id),
|
||||
dsse_envelope JSONB NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3) Stable identity: MethodKey + IL hash
|
||||
|
||||
### 3.1 MethodKey (must be stable across builds)
|
||||
|
||||
Use a normalized string like:
|
||||
|
||||
```
|
||||
{AssemblyName}|{DeclaringTypeFullName}|{MethodName}`{GenericArity}({ParamType1},{ParamType2},...)
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
* `MyApp|BillingController|Pay(System.String)`
|
||||
* `LibXYZ|LibXYZ.Parser|Parse(System.ReadOnlySpan<System.Byte>)`
|
||||
|
||||
### 3.2 Normalized IL hash (for smart-diff + incremental graph updates)
|
||||
|
||||
Raw IL bytes aren't stable (metadata tokens change). Normalize:
|
||||
|
||||
* opcode names
|
||||
* branch targets by *instruction index*, not offset
|
||||
* method operands by **resolved MethodKey**
|
||||
* string operands by literal or hashed literal
|
||||
* type operands by full name
|
||||
|
||||
Then hash `SHA256(normalized_bytes)`.
|
||||
|
||||
---
|
||||
|
||||
*[Remainder of advisory truncated for brevity - see original file for full content]*
|
||||
|
||||
---
|
||||
|
||||
## 12) What to implement first (in the order that produces value fastest)
|
||||
|
||||
### Week 1-2 scope (realistic, shippable)
|
||||
|
||||
1. Cecil call graph extraction (direct calls)
|
||||
2. MVC + Minimal API entrypoints
|
||||
3. Reverse BFS reachability with path witnesses
|
||||
4. DSSE witness signing + storage
|
||||
5. SurfaceBuilder v1:
|
||||
|
||||
* IL hash per method
|
||||
* changed methods as sinks
|
||||
* triggers via internal reverse BFS
|
||||
6. UI: "Show Witness" + "Verify Signature"
|
||||
|
||||
### Next increment (precision upgrades)
|
||||
|
||||
7. async/await mapping to original methods
|
||||
8. RTA + DI registration hints
|
||||
9. delegate tracking for Minimal API handlers (if not already)
|
||||
10. interface override triggers in surface builder
|
||||
|
||||
### Later (if you want "attackability", not just "reachability")
|
||||
|
||||
11. taint/dataflow for top sink classes (deserialization, path traversal, SQL, command exec)
|
||||
12. sanitizer modeling & parameter constraints
|
||||
|
||||
---
|
||||
|
||||
## 13) Common failure modes and how to harden
|
||||
|
||||
### MethodKey mismatches (surface vs app call)
|
||||
|
||||
* Ensure both are generated from the same normalization rules
|
||||
* For generic methods, prefer **definition** keys (strip instantiation)
|
||||
* Store both "exact" and "erased generic" variants if needed
|
||||
|
||||
### Multi-target frameworks
|
||||
|
||||
* SurfaceBuilder: compute triggers for each TFM, union them
|
||||
* App scan: choose TFM closest to build RID, but allow fallback to union
|
||||
|
||||
### Huge graphs
|
||||
|
||||
* Drop `System.*` nodes/edges unless:
|
||||
|
||||
* the vuln is in System.* (rare, but handle separately)
|
||||
* Deduplicate nodes by MethodKey across assemblies where safe
|
||||
* Use CSR arrays + pooled queues
|
||||
|
||||
### Reflection heavy projects
|
||||
|
||||
* Mark analysis confidence lower
|
||||
* Include "unknown edges present" in finding metadata
|
||||
* Still produce a witness path up to the reflective callsite
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also paste a **complete Cecil-based CallGraphBuilder class** (nodes+edges+PDB lines), plus the **SurfaceBuilder** that downloads NuGet packages and generates `vuln_surface_triggers` end-to-end.
|
||||
@@ -0,0 +1,197 @@
|
||||
# ARCHIVED ADVISORY
|
||||
|
||||
> **Archived**: 2025-12-18
|
||||
> **Status**: IMPLEMENTED
|
||||
> **Analysis**: Plan file `C:\Users\vlindos\.claude\plans\quizzical-hugging-hearth.md`
|
||||
>
|
||||
> ## Implementation Summary
|
||||
>
|
||||
> This advisory was analyzed and merged into the existing EPSS implementation plan:
|
||||
>
|
||||
> - **Master Plan**: `IMPL_3410_epss_v4_integration_master_plan.md` updated with raw + signal layer schemas
|
||||
> - **Sprint**: `SPRINT_3413_0001_0001_epss_live_enrichment.md` created with 30 tasks (original 14 + 16 from advisory)
|
||||
> - **Migrations Created**:
|
||||
> - `011_epss_raw_layer.sql` - Full JSONB payload storage (~5GB/year)
|
||||
> - `012_epss_signal_layer.sql` - Tenant-scoped signals with dedupe_key and explain_hash
|
||||
>
|
||||
> ## Gap Analysis Result
|
||||
>
|
||||
> | Advisory Proposal | Decision | Rationale |
|
||||
> |-------------------|----------|-----------|
|
||||
> | Raw feed layer (Layer 1) | IMPLEMENTED | Full JSONB storage for deterministic replay |
|
||||
> | Normalized layer (Layer 2) | ALIGNED | Already existed in IMPL_3410 |
|
||||
> | Signal-ready layer (Layer 3) | IMPLEMENTED | Tenant-scoped signals, model change detection |
|
||||
> | Multi-model support | DEFERRED | No customer demand |
|
||||
> | Meta-predictor training | SKIPPED | Out of scope (ML complexity) |
|
||||
> | A/B testing | SKIPPED | Infrastructure overhead |
|
||||
>
|
||||
> ## Key Enhancements Implemented
|
||||
>
|
||||
> 1. **Raw Feed Layer** (`epss_raw` table) - Stores full CSV payload as JSONB for replay
|
||||
> 2. **Signal-Ready Layer** (`epss_signal` table) - Tenant-scoped actionable events
|
||||
> 3. **Model Version Change Detection** - Suppresses noisy deltas on model updates
|
||||
> 4. **Explain Hash** - Deterministic SHA-256 for audit trail
|
||||
> 5. **Risk Band Mapping** - CRITICAL/HIGH/MEDIUM/LOW based on percentile
|
||||
|
||||
---
|
||||
|
||||
# Original Advisory Content
|
||||
|
||||
Here's a compact, practical blueprint for bringing **EPSS** into your stack without chaos: a **3-layer ingestion model** that keeps raw data, produces clean probabilities, and emits "signal-ready" events your risk engine can use immediately.
|
||||
|
||||
---
|
||||
|
||||
# Why this matters (super short)
|
||||
|
||||
* **EPSS** = predicted probability a vuln will be exploited soon.
|
||||
* Mixing "raw EPSS feed" directly into decisions makes audits, rollbacks, and model upgrades painful.
|
||||
* A **layered model** lets you **version probability evolution**, compare vendors, and train **meta-predictors on deltas** (how risk changes over time), not just on snapshots.
|
||||
|
||||
---
|
||||
|
||||
# The three layers (and how they map to Stella Ops)
|
||||
|
||||
1. **Raw feed layer (immutable)**
|
||||
|
||||
* **Goal:** Store exactly what the provider sent (EPSS v4 CSV/JSON, schema drift and all).
|
||||
* **Stella modules:** `Concelier` (preserve-prune source) writes; `Authority` handles signatures/hashes.
|
||||
* **Storage:** `postgres.epss_raw` (partitioned by day); blob column for the untouched payload; SHA-256 of source file.
|
||||
* **Why:** Full provenance + deterministic replay.
|
||||
|
||||
2. **Normalized probabilistic layer**
|
||||
|
||||
* **Goal:** Clean, typed tables keyed by `cve_id`, with **probability, percentile, model_version, asof_ts**.
|
||||
* **Stella modules:** `Excititor` (transform); `Policy Engine` reads.
|
||||
* **Storage:** `postgres.epss_prob` with a **surrogate key** `(cve_id, model_version, asof_ts)` and computed **delta fields** vs previous `asof_ts`.
|
||||
* **Extras:** Keep optional vendor columns (e.g., FIRST, custom regressors) to compare models side-by-side.
|
||||
|
||||
3. **Signal-ready layer (risk engine contracts)**
|
||||
|
||||
* **Goal:** Pre-chewed "events" your **Signals/Router** can route instantly.
|
||||
* **What's inside:** Only the fields needed for gating and UI: `cve_id`, `prob_now`, `prob_delta`, `percentile`, `risk_band`, `explain_hash`.
|
||||
* **Emit:** `first_signal`, `risk_increase`, `risk_decrease`, `quieted` with **idempotent event keys**.
|
||||
* **Stella modules:** `Signals` publishes, `Router` fan-outs, `Timeline` records; `Notify` handles subscriptions.
|
||||
|
||||
---
|
||||
|
||||
# Minimal Postgres schema (ready to paste)
|
||||
|
||||
```sql
|
||||
-- 1) Raw (immutable)
|
||||
create table epss_raw (
|
||||
id bigserial primary key,
|
||||
source_uri text not null,
|
||||
ingestion_ts timestamptz not null default now(),
|
||||
asof_date date not null,
|
||||
payload jsonb not null,
|
||||
payload_sha256 bytea not null
|
||||
);
|
||||
create index on epss_raw (asof_date);
|
||||
|
||||
-- 2) Normalized
|
||||
create table epss_prob (
|
||||
id bigserial primary key,
|
||||
cve_id text not null,
|
||||
model_version text not null,
|
||||
asof_ts timestamptz not null,
|
||||
probability double precision not null,
|
||||
percentile double precision,
|
||||
features jsonb,
|
||||
unique (cve_id, model_version, asof_ts)
|
||||
);
|
||||
|
||||
-- 3) Signal-ready
|
||||
create table epss_signal (
|
||||
signal_id bigserial primary key,
|
||||
cve_id text not null,
|
||||
asof_ts timestamptz not null,
|
||||
probability double precision not null,
|
||||
prob_delta double precision,
|
||||
risk_band text not null,
|
||||
model_version text not null,
|
||||
explain_hash bytea not null,
|
||||
unique (cve_id, model_version, asof_ts)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# C# ingestion skeleton (StellaOps.Scanner.Worker.DotNet style)
|
||||
|
||||
```csharp
|
||||
// 1) Fetch & store raw (Concelier)
|
||||
public async Task IngestRawAsync(Uri src, DateOnly asOfDate) {
|
||||
var bytes = await http.GetByteArrayAsync(src);
|
||||
var sha = SHA256.HashData(bytes);
|
||||
await pg.ExecuteAsync(
|
||||
"insert into epss_raw(source_uri, asof_date, payload, payload_sha256) values (@u,@d,@p::jsonb,@s)",
|
||||
new { u = src.ToString(), d = asOfDate, p = Encoding.UTF8.GetString(bytes), s = sha });
|
||||
}
|
||||
|
||||
// 2) Normalize (Excititor)
|
||||
public async Task NormalizeAsync(DateOnly asOfDate, string modelVersion) {
|
||||
var raws = await pg.QueryAsync<(string Payload)>("select payload from epss_raw where asof_date=@d", new { d = asOfDate });
|
||||
foreach (var r in raws) {
|
||||
foreach (var row in ParseCsvOrJson(r.Payload)) {
|
||||
await pg.ExecuteAsync(
|
||||
@"insert into epss_prob(cve_id, model_version, asof_ts, probability, percentile, features)
|
||||
values (@cve,@mv,@ts,@prob,@pct,@feat)
|
||||
on conflict do nothing",
|
||||
new { cve = row.Cve, mv = modelVersion, ts = row.AsOf, prob = row.Prob, pct = row.Pctl, feat = row.Features });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3) Emit signal-ready (Signals)
|
||||
public async Task EmitSignalsAsync(string modelVersion, double deltaThreshold) {
|
||||
var rows = await pg.QueryAsync(@"select cve_id, asof_ts, probability,
|
||||
probability - lag(probability) over (partition by cve_id, model_version order by asof_ts) as prob_delta
|
||||
from epss_prob where model_version=@mv", new { mv = modelVersion });
|
||||
|
||||
foreach (var r in rows) {
|
||||
var band = Band(r.probability);
|
||||
if (Math.Abs(r.prob_delta ?? 0) >= deltaThreshold) {
|
||||
var explainHash = DeterministicExplainHash(r);
|
||||
await pg.ExecuteAsync(@"insert into epss_signal
|
||||
(cve_id, asof_ts, probability, prob_delta, risk_band, model_version, explain_hash)
|
||||
values (@c,@t,@p,@d,@b,@mv,@h)
|
||||
on conflict do nothing",
|
||||
new { c = r.cve_id, t = r.asof_ts, p = r.probability, d = r.prob_delta, b = band, mv = modelVersion, h = explainHash });
|
||||
|
||||
await bus.PublishAsync("risk.epss.delta", new {
|
||||
cve = r.cve_id, ts = r.asof_ts, prob = r.probability, delta = r.prob_delta, band, model = modelVersion, explain = Convert.ToHexString(explainHash)
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Versioning & experiments (the secret sauce)
|
||||
|
||||
* **Model namespace:** `EPSS-4.0-<regressor-name>-<date>` so you can run multiple variants in parallel.
|
||||
* **Delta-training:** Train a small meta-predictor on **delta-probability** to forecast **"risk jumps in next N days."**
|
||||
* **A/B in production:** Route `model_version=x` to 50% of projects; compare **MTTA to patch** and **false-alarm rate**.
|
||||
|
||||
---
|
||||
|
||||
# Policy & UI wiring (quick contracts)
|
||||
|
||||
**Policy gates** (OPA/Rego or internal rules):
|
||||
|
||||
* Block if `risk_band in {HIGH, CRITICAL}` **AND** `prob_delta >= 0.1` in last 72h.
|
||||
* Soften if asset not reachable or mitigated by VEX.
|
||||
|
||||
**UI (Evidence pane):**
|
||||
|
||||
* Show **sparkline of EPSS over time**, highlight last delta.
|
||||
* "Why now?" button reveals **explain_hash** -> deterministic evidence payload.
|
||||
|
||||
---
|
||||
|
||||
# Ops & reliability
|
||||
|
||||
* Daily ingestion with **idempotent** runs (raw SHA guard).
|
||||
* Backfills: re-normalize from `epss_raw` for any new model without re-downloading.
|
||||
* **Deterministic replay:** export `(raw, transform code hash, model_version)` alongside results.
|
||||
Reference in New Issue
Block a user