save progress

This commit is contained in:
StellaOps Bot
2025-12-18 09:10:36 +02:00
parent b4235c134c
commit 28823a8960
169 changed files with 11995 additions and 449 deletions

View File

@@ -0,0 +1,919 @@
Heres a compact, practical way to add two highleverage capabilities to your scanner: **DSSEsigned path witnesses** and **SmartDiff × Reachability**—what they are, why they matter, and exactly how to implement them in StellaOps without ceremony.
---
# 1) DSSEsigned path witnesses (entrypoint → calls → sink)
**What it is (in plain terms):**
When you flag a CVE as “reachable,” also emit a tiny, humanreadable proof: the **exact path** from a real entrypoint (e.g., HTTP route, CLI verb, cron) through functions/methods to the **vulnerable sink**. Wrap that proof in a **DSSE** envelope and sign it. Anyone can verify the witness later—offline—without rerunning analysis.
**Why it matters:**
* Turns red flags into **auditable evidence** (quietbydesign).
* Lets CI/CD, auditors, and customers **verify** findings independently.
* Enables **deterministic replay** and provenance chains (ties nicely to intoto/SLSA).
**Minimal JSON witness (stable, vendorneutral):**
```json
{
"witness_schema": "stellaops.witness.v1",
"artifact": { "sbom_digest": "sha256:...", "component_purl": "pkg:nuget/Example@1.2.3" },
"vuln": { "id": "CVE-2024-XXXX", "source": "NVD", "range": "≤1.2.3" },
"entrypoint": { "kind": "http", "name": "GET /billing/pay" },
"path": [
{"symbol": "BillingController.Pay()", "file": "BillingController.cs", "line": 42},
{"symbol": "PaymentsService.Authorize()", "file": "PaymentsService.cs", "line": 88},
{"symbol": "LibXYZ.Parser.Parse()", "file": "Parser.cs", "line": 17}
],
"sink": { "symbol": "LibXYZ.Parser.Parse()", "type": "deserialization" },
"evidence": {
"callgraph_digest": "sha256:...",
"build_id": "dotnet:RID:linux-x64:sha256:...",
"analysis_config_digest": "sha256:..."
},
"observed_at": "2025-12-18T00:00:00Z"
}
```
**Wrap in DSSE (payloadType & payload are required)**
```json
{
"payloadType": "application/vnd.stellaops.witness+json",
"payload": "base64(JSON_above)",
"signatures": [{ "keyid": "attestor-stellaops-ed25519", "sig": "base64(...)" }]
}
```
**.NET 10 signing/verifying (Ed25519)**
```csharp
using System.Security.Cryptography;
using System.Text.Json;
var payloadBytes = JsonSerializer.SerializeToUtf8Bytes(witnessJsonObj);
var dsse = new {
payloadType = "application/vnd.stellaops.witness+json",
payload = Convert.ToBase64String(payloadBytes),
signatures = new [] { new { keyid = keyId, sig = Convert.ToBase64String(Sign(payloadBytes, privateKey)) } }
};
byte[] Sign(byte[] data, byte[] privateKey)
{
using var ed = new Ed25519();
// import private key, sign data (left as your Ed25519 helper)
return ed.SignData(data, privateKey);
}
```
**Where to emit:**
* **Scanner.Worker**: after reachability confirms `reachable=true`, emit witness → **Attestor** signs → **Authority** stores (Postgres) → optional Rekorstyle mirror.
* Expose `/witness/{findingId}` for download & independent verification.
---
# 2) SmartDiff × Reachability (incremental, lownoise updates)
**What it is:**
On **SBOM/VEX/dependency** deltas, dont rescan everything. Update only **affected regions** of the call graph and recompute reachability **just for changed nodes/edges**.
**Why it matters:**
* **Orderofmagnitude faster** incremental scans.
* Fewer flaky diffs; triage stays focused on **meaningful risk change**.
* Perfect for PR gating: “what changed” → “what became reachable/unreachable.”
**Core idea (graphreachability):**
* Maintain a perservice **call graph** `G = (V, E)` with **entrypoint set** `S`.
* On diff: compute changed nodes/edges ΔV/ΔE.
* Run **incremental BFS/DFS** from impacted nodes to sinks (forward or backward), reusing memoized results.
* Recompute only **frontiers** touched by Δ.
**Minimal tables (Postgres):**
```sql
-- Nodes (functions/methods)
CREATE TABLE cg_nodes(
id BIGSERIAL PRIMARY KEY,
service TEXT, symbol TEXT, file TEXT, line INT,
hash TEXT, UNIQUE(service, hash)
);
-- Edges (calls)
CREATE TABLE cg_edges(
src BIGINT REFERENCES cg_nodes(id),
dst BIGINT REFERENCES cg_nodes(id),
kind TEXT, PRIMARY KEY(src, dst)
);
-- Entrypoints & Sinks
CREATE TABLE cg_entrypoints(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY);
CREATE TABLE cg_sinks(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY, sink_type TEXT);
-- Memoized reachability cache
CREATE TABLE cg_reach_cache(
entry_id BIGINT, sink_id BIGINT,
path JSONB, reachable BOOLEAN,
updated_at TIMESTAMPTZ,
PRIMARY KEY(entry_id, sink_id)
);
```
**Incremental algorithm (pseudocode):**
```text
Input: ΔSBOM, ΔDeps, ΔCode → ΔNodes, ΔEdges
1) Apply Δ to cg_nodes/cg_edges
2) ImpactSet = neighbors(ΔNodes endpoints(ΔEdges))
3) For each e∈Entrypoints intersect ancestors(ImpactSet):
Recompute forward search to affected sinks, stop early on unchanged subgraphs
Update cg_reach_cache; if state flips, emit new/updated DSSE witness
```
**.NET 10 reachability sketch (fast & local):**
```csharp
HashSet<int> ImpactSet = ComputeImpact(deltaNodes, deltaEdges);
foreach (var e in Intersect(Entrypoints, Ancestors(ImpactSet)))
{
var res = BoundedReach(e, affectedSinks, graph, cache);
foreach (var r in res.Changed)
{
cache.Upsert(e, r.Sink, r.Path, r.Reachable);
if (r.Reachable) EmitDsseWitness(e, r.Sink, r.Path);
}
}
```
**CI/PR flow:**
1. Build → SBOM diff → Dependency diff → Callgraph delta.
2. Run incremental reachability.
3. If any `unreachable→reachable` transitions: **fail gate**, attach DSSE witnesses.
4. If `reachable→unreachable`: autoclose prior findings (and archive prior witness).
---
# UX hooks (quick wins)
* In findings list, add a **“Show Witness”** button → modal renders the signed path (entrypoint→…→sink) + **“Verify Signature”** oneclick.
* In PR checks, summarize only **state flips** with tiny links: “+2 reachable (view witness)” / “1 (now unreachable)”.
---
# Minimal tasks to get this live
* **Scanner.Worker**: build callgraph extraction (per language), add incremental graph store, reachability cache.
* **Attestor**: DSSE signing endpoint + key management (Ed25519 by default; PQC mode later).
* **Authority**: tables above + witness storage + retrieval API.
* **Router/CI plugin**: PR annotation with **state flips** and links to witnesses.
* **UI**: witness modal + signature verify.
If you want, I can draft the exact Postgres migrations, the C# repositories, and a tiny verifier CLI that checks DSSE signatures and prints the call path.
Below is a concrete, buildable blueprint for an **advanced reachability analysis engine** inside Stella Ops. Im going to assume your “Stella Ops” components are roughly:
* **Scanner.Worker**: runs analyses in CI / on artifacts
* **Authority**: stores graphs/findings/witnesses
* **Attestor**: signs DSSE envelopes (Ed25519)
* (optional) **SurfaceBuilder**: background worker that computes “vuln surfaces” for packages
The key advance is: **dont treat a CVE as “a package”**. Treat it as a **set of trigger methods** (public API) that can reach the vulnerable code inside the dependency—computed by “SmartDiff” once, reused everywhere.
---
## 0) Define the contract (precision/soundness) up front
If you dont write this down, youll fight false positives/negatives forever.
### What Stella Ops will guarantee (first release)
* **Whole-program static call graph** (app + selected dependency assemblies)
* **Context-insensitive** (fast), **path witness** extracted (shortest path)
* **Dynamic dispatch handled** with CHA/RTA (+ DI hints), with explicit uncertainty flags
* **Reflection handled best-effort** (constant-string resolution), otherwise “unknown edge”
### What it will NOT guarantee (first release)
* Perfect handling of reflection / `dynamic` / runtime codegen
* Perfect delegate/event resolution across complex flows
* Full taint/dataflow reachability (you can add later)
This is fine. The major value is: “**we can show you the call path**” and “**we can prove the vuln is triggered by calling these library APIs**”.
---
## 1) The big idea: “Vuln surfaces” (Smart-Diff → triggers)
### Problem
CVE feeds typically say “package X version range Y is vulnerable” but rarely say *which methods*. If you only do package-level reachability, noise is huge.
### Solution
For each CVE+package, compute a **vulnerability surface**:
* **Candidate sinks** = methods changed between vulnerable and fixed versions (diff at IL level)
* **Trigger methods** = *public/exported* methods in the vulnerable version that can reach those changed methods internally
Then your service scan becomes:
> “Can any entrypoint reach any trigger method?”
This is both faster and more precise.
---
## 2) Data model (Authority / Postgres)
You already had call graph tables; heres a concrete schema that supports:
* graph snapshots
* incremental updates
* vuln surfaces
* reachability cache
* DSSE witnesses
### 2.1 Graph tables
```sql
CREATE TABLE cg_snapshots (
snapshot_id BIGSERIAL PRIMARY KEY,
service TEXT NOT NULL,
build_id TEXT NOT NULL,
graph_digest TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(service, build_id)
);
CREATE TABLE cg_nodes (
node_id BIGSERIAL PRIMARY KEY,
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
method_key TEXT NOT NULL, -- stable key (see below)
asm_name TEXT,
type_name TEXT,
method_name TEXT,
file_path TEXT,
line_start INT,
il_hash TEXT, -- normalized IL hash for diffing
flags INT NOT NULL DEFAULT 0, -- bitflags: has_reflection, compiler_generated, etc.
UNIQUE(snapshot_id, method_key)
);
CREATE TABLE cg_edges (
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
src_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
dst_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
kind SMALLINT NOT NULL, -- 0=call,1=newobj,2=dispatch,3=delegate,4=reflection_guess,...
PRIMARY KEY(snapshot_id, src_node_id, dst_node_id, kind)
);
CREATE TABLE cg_entrypoints (
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
kind TEXT NOT NULL, -- http, grpc, cli, job, etc.
name TEXT NOT NULL, -- GET /foo, "Main", etc.
PRIMARY KEY(snapshot_id, node_id, kind, name)
);
```
### 2.2 Vuln surface tables (SmartDiff artifacts)
```sql
CREATE TABLE vuln_surfaces (
surface_id BIGSERIAL PRIMARY KEY,
ecosystem TEXT NOT NULL, -- nuget
package TEXT NOT NULL,
cve_id TEXT NOT NULL,
vuln_version TEXT NOT NULL, -- a representative vulnerable version
fixed_version TEXT NOT NULL,
surface_digest TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(ecosystem, package, cve_id, vuln_version, fixed_version)
);
CREATE TABLE vuln_surface_sinks (
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
sink_method_key TEXT NOT NULL,
reason TEXT NOT NULL, -- changed|added|removed|heuristic
PRIMARY KEY(surface_id, sink_method_key)
);
CREATE TABLE vuln_surface_triggers (
surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
trigger_method_key TEXT NOT NULL,
sink_method_key TEXT NOT NULL,
internal_path JSONB, -- optional: library internal witness path
PRIMARY KEY(surface_id, trigger_method_key, sink_method_key)
);
```
### 2.3 Reachability cache & witnesses
```sql
CREATE TABLE reach_findings (
finding_id BIGSERIAL PRIMARY KEY,
snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
cve_id TEXT NOT NULL,
ecosystem TEXT NOT NULL,
package TEXT NOT NULL,
package_version TEXT NOT NULL,
reachable BOOLEAN NOT NULL,
reachable_entrypoints INT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(snapshot_id, cve_id, package, package_version)
);
CREATE TABLE reach_witnesses (
witness_id BIGSERIAL PRIMARY KEY,
finding_id BIGINT REFERENCES reach_findings(finding_id) ON DELETE CASCADE,
entry_node_id BIGINT REFERENCES cg_nodes(node_id),
dsse_envelope JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
```
---
## 3) Stable identity: MethodKey + IL hash
### 3.1 MethodKey (must be stable across builds)
Use a normalized string like:
```
{AssemblyName}|{DeclaringTypeFullName}|{MethodName}`{GenericArity}({ParamType1},{ParamType2},...)
```
Examples:
* `MyApp|BillingController|Pay(System.String)`
* `LibXYZ|LibXYZ.Parser|Parse(System.ReadOnlySpan<System.Byte>)`
### 3.2 Normalized IL hash (for smart-diff + incremental graph updates)
Raw IL bytes arent stable (metadata tokens change). Normalize:
* opcode names
* branch targets by *instruction index*, not offset
* method operands by **resolved MethodKey**
* string operands by literal or hashed literal
* type operands by full name
Then hash `SHA256(normalized_bytes)`.
---
## 4) Call graph extraction for .NET (concrete, doable)
### Tooling choice
Start with **Mono.Cecil** (MIT license, easy IL traversal). You can later swap to `System.Reflection.Metadata` for speed.
### 4.1 Build process (Scanner.Worker)
1. `dotnet restore` (use your locked restore)
2. `dotnet build -c Release /p:DebugType=portable /p:DebugSymbols=true`
3. Collect:
* app assemblies: `bin/Release/**/publish/*.dll` or build output
* `.pdb` files for sequence points (file/line for witnesses)
### 4.2 Cecil loader
```csharp
var rp = new ReaderParameters {
ReadSymbols = true,
SymbolReaderProvider = new PortablePdbReaderProvider()
};
var asm = AssemblyDefinition.ReadAssembly(dllPath, rp);
```
### 4.3 Node extraction (methods)
Walk all types, including nested:
```csharp
IEnumerable<TypeDefinition> AllTypes(ModuleDefinition m)
{
var stack = new Stack<TypeDefinition>(m.Types);
while (stack.Count > 0)
{
var t = stack.Pop();
yield return t;
foreach (var nt in t.NestedTypes) stack.Push(nt);
}
}
foreach (var type in AllTypes(asm.MainModule))
foreach (var method in type.Methods)
{
var key = MethodKey.From(method); // your normalizer
var (file, line) = PdbFirstSequencePoint(method);
var ilHash = method.HasBody ? ILFingerprint(method) : null;
// store node (method_key, file, line, il_hash, flags...)
}
```
### 4.4 Edge extraction (direct calls)
```csharp
foreach (var method in type.Methods.Where(m => m.HasBody))
{
var srcKey = MethodKey.From(method);
foreach (var ins in method.Body.Instructions)
{
if (ins.Operand is MethodReference mr)
{
if (ins.OpCode.Code is Code.Call or Code.Callvirt or Code.Newobj)
{
var dstKey = MethodKey.From(mr); // important: stable even if not resolved
edges.Add(new Edge(srcKey, dstKey, kind: CallKind.Direct));
}
if (ins.OpCode.Code is Code.Ldftn or Code.Ldvirtftn)
{
// delegate capture (handle later)
}
}
}
}
```
---
## 5) Advanced precision: dynamic dispatch + DI + async/await
If you stop at direct edges only, youll miss many real paths.
### 5.1 Async/await mapping (critical for readable witnesses)
Async methods compile into a state machine `MoveNext()`. You want edges attributed back to the original method.
In Cecil:
* Check `AsyncStateMachineAttribute` on a method
* It references a state machine type
* Find that types `MoveNext` method
* Map `MoveNextKey -> OriginalMethodKey`
Then, while extracting edges:
```csharp
srcKey = MoveNextToOriginal.TryGetValue(srcKey, out var original) ? original : srcKey;
```
Do the same for iterator state machines.
### 5.2 Virtual/interface dispatch (CHA/RTA)
You need 2 maps:
1. **type hierarchy / interface impl map**
2. **override map** from “declared method” → “implementation method(s)”
**Build override map**
```csharp
// For each method, Cecil exposes method.Overrides for explicit implementations.
overrideMap[MethodKey.From(overrideRef)] = MethodKey.From(methodDef);
```
**CHA**: for callvirt to virtual method `T.M`, add edges to overrides in derived classes
**RTA**: restrict to derived classes that are actually instantiated.
How to get instantiated types:
* look for `newobj` instructions and add the created type to `InstantiatedTypes`
* plus DI registrations (below)
### 5.3 DI hints (Microsoft.Extensions.DependencyInjection)
You will see calls like:
* `ServiceCollectionServiceExtensions.AddTransient<TService, TImpl>(...)`
In IL these are generic method calls. Detect and record `TService -> TImpl` as “instantiated”. This massively improves RTA for modern .NET apps.
### 5.4 Delegates/lambdas (good enough approach)
Implement intraprocedural tracking:
* when you see `ldftn SomeMethod` then `newobj Action::.ctor` then `stloc.s X`
* store `delegateTargets[local X] += SomeMethod`
* when you see `ldloc.s X` and later `callvirt Invoke`, add edges to targets
This makes Minimal API entrypoint discovery work too.
### 5.5 Reflection (best-effort)
Implement only high-signal heuristics:
* `typeof(T).GetMethod("Foo")` with constant "Foo"
* `GetType().GetMethod("Foo")` with constant "Foo" (type unknown → mark uncertain)
If resolved, add edge with `kind=reflection_guess`.
If not, set node flag `has_reflection = true` and in results show “may be incomplete”.
---
## 6) Entrypoint detection (concrete detectors)
### 6.1 MVC controllers
Detect:
* types deriving from `Microsoft.AspNetCore.Mvc.ControllerBase`
* methods:
* public
* not `[NonAction]`
* has `[HttpGet]`, `[HttpPost]`, `[Route]` etc.
Extract route template from attributes ctor arguments.
Store in `cg_entrypoints`:
* kind = `http`
* name = `GET /billing/pay` (compose verb+template)
### 6.2 Minimal APIs
Scan `Program.Main` IL:
* find calls to `MapGet`, `MapPost`, ...
* extract route string from preceding `ldstr`
* resolve handler method via delegate tracking (ldftn)
Entry:
* kind = `http`
* name = `GET /foo`
### 6.3 CLI
Find assembly entry point method (`asm.EntryPoint`) or `static Main`.
Entry:
* kind = `cli`
* name = `Main`
Start here. Add gRPC/jobs later.
---
## 7) Smart-Diff SurfaceBuilder (the “advanced” part)
This is what makes your reachability actually meaningful for CVEs.
### 7.1 SurfaceBuilder inputs
From your vuln ingestion pipeline:
* ecosystem = nuget
* package = `LibXYZ`
* affected range = `<= 1.2.3`
* fixed version = `1.2.4`
* CVE id
### 7.2 Choose a vulnerable version to diff
Pick the **highest affected version below fixed**.
* fixed = 1.2.4
* vulnerable representative = 1.2.3
(If multiple fixed versions exist, build multiple surfaces.)
### 7.3 Download both packages
Use NuGet.Protocol to download `.nupkg`, unzip, pick TFMs you care about (often `netstandard2.0` is safest). Compute fingerprints for each assembly.
### 7.4 Compute method fingerprints
For each method:
* MethodKey
* Normalized IL hash
### 7.5 Diff
```
ChangedMethods = { k | hashVuln[k] != hashFixed[k] } added removed
```
Store these as `vuln_surface_sinks` with reason.
### 7.6 Build internal library call graph
Same Cecil extraction, but only for package assemblies.
Now compute triggers:
**Reverse BFS from sinks**:
* Start from all sink method keys
* Walk predecessors
* When you encounter a **public/exported method**, record it as a trigger
Also store one internal path for each trigger → sink (for witnesses).
### 7.7 Add interface/base declarations as triggers
Important: your app might call a library via an interface method signature, not the concrete implementation.
For each trigger implementation method:
* for each `method.Overrides` entry, add the overridden method key as an additional trigger
This reduces dependence on perfect dispatch expansion during app scanning.
### 7.8 Persist the surface
Store:
* sinks set
* triggers set
* internal witness paths (optional but highly valuable)
Now youve converted a “version range” CVE into “these specific library APIs are dangerous”.
---
## 8) Reachability engine (fast, witness-producing)
### 8.1 In-memory graph format (CSR)
Dont BFS off dictionaries; youll die on perf.
Build integer indices:
* `method_key -> nodeIndex (0..N-1)`
* store arrays:
* `predOffsets[N+1]`
* `preds[edgeCount]`
Construction:
1. count predecessors per node
2. prefix sum to offsets
3. fill preds
### 8.2 Reverse BFS from sinks
This computes:
* `visited[node]` = can reach a sink
* `parent[node]` = next node toward a sink (for path reconstruction)
```csharp
public sealed class ReachabilityEngine
{
public ReachabilityResult Compute(
Graph g,
ReadOnlySpan<int> entrypoints,
ReadOnlySpan<int> sinks)
{
var visitedMark = g.VisitMark; // int[] length N (reused across runs)
var parent = g.Parent; // int[] length N (reused)
g.RunId++;
var q = new IntQueue(capacity: g.NodeCount);
var sinkSet = new BitSet(g.NodeCount);
foreach (var s in sinks)
{
sinkSet.Set(s);
visitedMark[s] = g.RunId;
parent[s] = s;
q.Enqueue(s);
}
while (q.TryDequeue(out var v))
{
var start = g.PredOffsets[v];
var end = g.PredOffsets[v + 1];
for (int i = start; i < end; i++)
{
var p = g.Preds[i];
if (visitedMark[p] == g.RunId) continue;
visitedMark[p] = g.RunId;
parent[p] = v;
q.Enqueue(p);
}
}
// Collect reachable entrypoints and paths
var results = new List<EntryWitness>();
foreach (var e in entrypoints)
{
if (visitedMark[e] != g.RunId) continue;
var path = ReconstructPath(e, parent, sinkSet);
results.Add(new EntryWitness(e, path));
}
return new ReachabilityResult(results);
}
private static int[] ReconstructPath(int entry, int[] parent, BitSet sinks)
{
var path = new List<int>(32);
int cur = entry;
path.Add(cur);
// follow parent pointers until a sink
for (int guard = 0; guard < 10_000; guard++)
{
if (sinks.Get(cur)) break;
var nxt = parent[cur];
if (nxt == cur || nxt < 0) break; // safety
cur = nxt;
path.Add(cur);
}
return path.ToArray();
}
}
```
### 8.3 Producing the witness
For each node index in the path:
* method_key
* file_path / line_start (if known)
* optional flags (reflection_guess edge, dispatch edge)
Then attach:
* vuln id, package, version
* entrypoint kind/name
* graph digest + config digest
* surface digest
* timestamp
Send JSON to Attestor for DSSE signing, store envelope in Authority.
---
## 9) Scaling: dont do BFS 500 times if you can avoid it
### 9.1 First-line scaling (usually enough)
* Group vulnerabilities by package/version → surfaces reused
* Only run reachability for vulns where:
* dependency present AND
* surface exists OR fallback mode
* Limit witnesses per vuln (top 3)
In practice, with N~50k nodes and E~200k edges, a reverse BFS is fast in C# if done with arrays.
### 9.2 Incremental Smart-Diff × Reachability (your “low noise” killer feature)
#### Step A: compute graph delta between snapshots
Use `il_hash` per method to detect changed nodes:
* added / removed / changed nodes
* edges updated only for changed nodes
#### Step B: decide which vulnerabilities need recompute
Store a cached reverse-reachable set per vuln surface if you want (bitset), OR just do a cheaper heuristic:
Recompute for vulnerability if:
* sink set changed (new surface or version changed), OR
* any changed node is on any previously stored witness path, OR
* entrypoints changed, OR
* impacted nodes touch any trigger nodes predecessors (use a small localized search)
A practical approach:
* store all node IDs that appear in any witness path for that vuln
* if delta touches any of those nodes/edges, recompute
* otherwise reuse cached result
This yields a massive win on PR scans where most code is unchanged.
#### Step C: “Impact frontier” recompute (optional)
If you want more advanced:
* compute `ImpactSet = ΔNodes endpoints(ΔEdges)`
* run reverse BFS **starting from ImpactSet ∩ ReverseReachSet** and update visited marks
This is trickier to implement correctly (dynamic graph), so Id ship the heuristic first.
---
## 10) Practical fallback modes (dont block shipping)
You wont have surfaces for every CVE on day 1. Handle this gracefully:
### Mode 1: Surface-based reachability (best)
* sink = trigger methods from surface
* result: “reachable” with path
### Mode 2: Package API usage (good fallback)
* sink = *any* method in that package that is called by app
* result: “package reachable” (lower confidence), still provide path to callsite
### Mode 3: Dependency present only (SBOM level)
* no call graph needed
* result: “present” only
Your UI can show confidence tiers:
* **Confirmed reachable (surface)**
* **Likely reachable (package API)**
* **Present only (SBOM)**
---
## 11) Integration points inside Stella Ops
### Scanner.Worker (per build)
1. Build/collect assemblies + pdb
2. `CallGraphBuilder` → nodes/edges/entrypoints + graph_digest
3. Load SBOM vulnerabilities list
4. For each vuln:
* resolve surface triggers; if missing → enqueue SurfaceBuilder job + fallback mode
* run reachability BFS
* for each reachable entrypoint: emit DSSE witness
5. Persist findings/witnesses
### SurfaceBuilder (async worker)
* triggered by “surface missing” events or nightly preload of top packages
* computes surface once, stores forever
### Authority
* stores graphs, surfaces, findings, witnesses
* provides retrieval APIs for UI/CI
---
## 12) What to implement first (in the order that produces value fastest)
### Week 12 scope (realistic, shippable)
1. Cecil call graph extraction (direct calls)
2. MVC + Minimal API entrypoints
3. Reverse BFS reachability with path witnesses
4. DSSE witness signing + storage
5. SurfaceBuilder v1:
* IL hash per method
* changed methods as sinks
* triggers via internal reverse BFS
6. UI: “Show Witness” + “Verify Signature”
### Next increment (precision upgrades)
7. async/await mapping to original methods
8. RTA + DI registration hints
9. delegate tracking for Minimal API handlers (if not already)
10. interface override triggers in surface builder
### Later (if you want “attackability”, not just “reachability”)
11. taint/dataflow for top sink classes (deserialization, path traversal, SQL, command exec)
12. sanitizer modeling & parameter constraints
---
## 13) Common failure modes and how to harden
### MethodKey mismatches (surface vs app call)
* Ensure both are generated from the same normalization rules
* For generic methods, prefer **definition** keys (strip instantiation)
* Store both “exact” and “erased generic” variants if needed
### Multi-target frameworks
* SurfaceBuilder: compute triggers for each TFM, union them
* App scan: choose TFM closest to build RID, but allow fallback to union
### Huge graphs
* Drop `System.*` nodes/edges unless:
* the vuln is in System.* (rare, but handle separately)
* Deduplicate nodes by MethodKey across assemblies where safe
* Use CSR arrays + pooled queues
### Reflection heavy projects
* Mark analysis confidence lower
* Include “unknown edges present” in finding metadata
* Still produce a witness path up to the reflective callsite
---
If you want, I can also paste a **complete Cecil-based CallGraphBuilder class** (nodes+edges+PDB lines), plus the **SurfaceBuilder** that downloads NuGet packages and generates `vuln_surface_triggers` end-to-end.