Files

StellaOps Bot 28823a8960 save progress

2025-12-18 09:10:36 +02:00

28 KiB

Raw Blame History

Here’s a compact, practical way to add two high‑leverage capabilities to your scanner: DSSE‑signed path witnesses and Smart‑Diff × Reachability—what they are, why they matter, and exactly how to implement them in Stella Ops without ceremony.

1) DSSE‑signed path witnesses (entrypoint → calls → sink)

What it is (in plain terms): When you flag a CVE as “reachable,” also emit a tiny, human‑readable proof: the exact path from a real entrypoint (e.g., HTTP route, CLI verb, cron) through functions/methods to the vulnerable sink. Wrap that proof in a DSSE envelope and sign it. Anyone can verify the witness later—offline—without rerunning analysis.

Why it matters:

Turns red flags into auditable evidence (quiet‑by‑design).
Lets CI/CD, auditors, and customers verify findings independently.
Enables deterministic replay and provenance chains (ties nicely to in‑toto/SLSA).

Minimal JSON witness (stable, vendor‑neutral):

{
  "witness_schema": "stellaops.witness.v1",
  "artifact": { "sbom_digest": "sha256:...", "component_purl": "pkg:nuget/Example@1.2.3" },
  "vuln": { "id": "CVE-2024-XXXX", "source": "NVD", "range": "≤1.2.3" },
  "entrypoint": { "kind": "http", "name": "GET /billing/pay" },
  "path": [
    {"symbol": "BillingController.Pay()", "file": "BillingController.cs", "line": 42},
    {"symbol": "PaymentsService.Authorize()", "file": "PaymentsService.cs", "line": 88},
    {"symbol": "LibXYZ.Parser.Parse()", "file": "Parser.cs", "line": 17}
  ],
  "sink": { "symbol": "LibXYZ.Parser.Parse()", "type": "deserialization" },
  "evidence": {
    "callgraph_digest": "sha256:...",
    "build_id": "dotnet:RID:linux-x64:sha256:...",
    "analysis_config_digest": "sha256:..."
  },
  "observed_at": "2025-12-18T00:00:00Z"
}

Wrap in DSSE (payloadType & payload are required)

{
  "payloadType": "application/vnd.stellaops.witness+json",
  "payload": "base64(JSON_above)",
  "signatures": [{ "keyid": "attestor-stellaops-ed25519", "sig": "base64(...)" }]
}

.NET 10 signing/verifying (Ed25519)

using System.Security.Cryptography;
using System.Text.Json;

var payloadBytes = JsonSerializer.SerializeToUtf8Bytes(witnessJsonObj);
var dsse = new {
  payloadType = "application/vnd.stellaops.witness+json",
  payload = Convert.ToBase64String(payloadBytes),
  signatures = new [] { new { keyid = keyId, sig = Convert.ToBase64String(Sign(payloadBytes, privateKey)) } }
};
byte[] Sign(byte[] data, byte[] privateKey)
{
    using var ed = new Ed25519();
    // import private key, sign data (left as your Ed25519 helper)
    return ed.SignData(data, privateKey);
}

Where to emit:

Scanner.Worker: after reachability confirms reachable=true, emit witness → Attestor signs → Authority stores (Postgres) → optional Rekor‑style mirror.
Expose /witness/{findingId} for download & independent verification.

2) Smart‑Diff × Reachability (incremental, low‑noise updates)

What it is: On SBOM/VEX/dependency deltas, don’t rescan everything. Update only affected regions of the call graph and recompute reachability just for changed nodes/edges.

Why it matters:

Order‑of‑magnitude faster incremental scans.
Fewer flaky diffs; triage stays focused on meaningful risk change.
Perfect for PR gating: “what changed” → “what became reachable/unreachable.”

Core idea (graph‑reachability):

Maintain a per‑service call graph G = (V, E) with entrypoint set S.
On diff: compute changed nodes/edges ΔV/ΔE.
Run incremental BFS/DFS from impacted nodes to sinks (forward or backward), reusing memoized results.
Recompute only frontiers touched by Δ.

Minimal tables (Postgres):

-- Nodes (functions/methods)
CREATE TABLE cg_nodes(
  id BIGSERIAL PRIMARY KEY,
  service TEXT, symbol TEXT, file TEXT, line INT,
  hash TEXT, UNIQUE(service, hash)
);
-- Edges (calls)
CREATE TABLE cg_edges(
  src BIGINT REFERENCES cg_nodes(id),
  dst BIGINT REFERENCES cg_nodes(id),
  kind TEXT, PRIMARY KEY(src, dst)
);
-- Entrypoints & Sinks
CREATE TABLE cg_entrypoints(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY);
CREATE TABLE cg_sinks(node_id BIGINT REFERENCES cg_nodes(id) PRIMARY KEY, sink_type TEXT);

-- Memoized reachability cache
CREATE TABLE cg_reach_cache(
  entry_id BIGINT, sink_id BIGINT,
  path JSONB, reachable BOOLEAN,
  updated_at TIMESTAMPTZ,
  PRIMARY KEY(entry_id, sink_id)
);

Incremental algorithm (pseudocode):

Input: ΔSBOM, ΔDeps, ΔCode → ΔNodes, ΔEdges
1) Apply Δ to cg_nodes/cg_edges
2) ImpactSet = neighbors(ΔNodes ∪ endpoints(ΔEdges))
3) For each e∈Entrypoints intersect ancestors(ImpactSet):
     Recompute forward search to affected sinks, stop early on unchanged subgraphs
     Update cg_reach_cache; if state flips, emit new/updated DSSE witness

.NET 10 reachability sketch (fast & local):

HashSet<int> ImpactSet = ComputeImpact(deltaNodes, deltaEdges);
foreach (var e in Intersect(Entrypoints, Ancestors(ImpactSet)))
{
    var res = BoundedReach(e, affectedSinks, graph, cache);
    foreach (var r in res.Changed)
    {
        cache.Upsert(e, r.Sink, r.Path, r.Reachable);
        if (r.Reachable) EmitDsseWitness(e, r.Sink, r.Path);
    }
}

CI/PR flow:

Build → SBOM diff → Dependency diff → Call‑graph delta.
Run incremental reachability.
If any unreachable→reachable transitions: fail gate, attach DSSE witnesses.
If reachable→unreachable: auto‑close prior findings (and archive prior witness).

UX hooks (quick wins)

In findings list, add a “Show Witness” button → modal renders the signed path (entrypoint→…→sink) + “Verify Signature” one‑click.
In PR checks, summarize only state flips with tiny links: “+2 reachable (view witness)” / “−1 (now unreachable)”.

Minimal tasks to get this live

Scanner.Worker: build call‑graph extraction (per language), add incremental graph store, reachability cache.
Attestor: DSSE signing endpoint + key management (Ed25519 by default; PQC mode later).
Authority: tables above + witness storage + retrieval API.
Router/CI plugin: PR annotation with state flips and links to witnesses.
UI: witness modal + signature verify.

If you want, I can draft the exact Postgres migrations, the C# repositories, and a tiny verifier CLI that checks DSSE signatures and prints the call path. Below is a concrete, buildable blueprint for an advanced reachability analysis engine inside Stella Ops. I’m going to assume your “Stella Ops” components are roughly:

Scanner.Worker: runs analyses in CI / on artifacts
Authority: stores graphs/findings/witnesses
Attestor: signs DSSE envelopes (Ed25519)
(optional) SurfaceBuilder: background worker that computes “vuln surfaces” for packages

The key advance is: don’t treat a CVE as “a package”. Treat it as a set of trigger methods (public API) that can reach the vulnerable code inside the dependency—computed by “Smart‑Diff” once, reused everywhere.

0) Define the contract (precision/soundness) up front

If you don’t write this down, you’ll fight false positives/negatives forever.

What Stella Ops will guarantee (first release)

Whole-program static call graph (app + selected dependency assemblies)
Context-insensitive (fast), path witness extracted (shortest path)
Dynamic dispatch handled with CHA/RTA (+ DI hints), with explicit uncertainty flags
Reflection handled best-effort (constant-string resolution), otherwise “unknown edge”

What it will NOT guarantee (first release)

Perfect handling of reflection / dynamic / runtime codegen
Perfect delegate/event resolution across complex flows
Full taint/dataflow reachability (you can add later)

This is fine. The major value is: “we can show you the call path” and “we can prove the vuln is triggered by calling these library APIs”.

1) The big idea: “Vuln surfaces” (Smart-Diff → triggers)

Problem

CVE feeds typically say “package X version range Y is vulnerable” but rarely say which methods. If you only do package-level reachability, noise is huge.

Solution

For each CVE+package, compute a vulnerability surface:

Candidate sinks = methods changed between vulnerable and fixed versions (diff at IL level)
Trigger methods = public/exported methods in the vulnerable version that can reach those changed methods internally

Then your service scan becomes:

“Can any entrypoint reach any trigger method?”

This is both faster and more precise.

2) Data model (Authority / Postgres)

You already had call graph tables; here’s a concrete schema that supports:

graph snapshots
incremental updates
vuln surfaces
reachability cache
DSSE witnesses

2.1 Graph tables

CREATE TABLE cg_snapshots (
  snapshot_id BIGSERIAL PRIMARY KEY,
  service TEXT NOT NULL,
  build_id TEXT NOT NULL,
  graph_digest TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(service, build_id)
);

CREATE TABLE cg_nodes (
  node_id BIGSERIAL PRIMARY KEY,
  snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
  method_key TEXT NOT NULL,              -- stable key (see below)
  asm_name TEXT,
  type_name TEXT,
  method_name TEXT,
  file_path TEXT,
  line_start INT,
  il_hash TEXT,                          -- normalized IL hash for diffing
  flags INT NOT NULL DEFAULT 0,          -- bitflags: has_reflection, compiler_generated, etc.
  UNIQUE(snapshot_id, method_key)
);

CREATE TABLE cg_edges (
  snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
  src_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
  dst_node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
  kind SMALLINT NOT NULL,                -- 0=call,1=newobj,2=dispatch,3=delegate,4=reflection_guess,...
  PRIMARY KEY(snapshot_id, src_node_id, dst_node_id, kind)
);

CREATE TABLE cg_entrypoints (
  snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
  node_id BIGINT REFERENCES cg_nodes(node_id) ON DELETE CASCADE,
  kind TEXT NOT NULL,                    -- http, grpc, cli, job, etc.
  name TEXT NOT NULL,                    -- GET /foo, "Main", etc.
  PRIMARY KEY(snapshot_id, node_id, kind, name)
);

2.2 Vuln surface tables (Smart‑Diff artifacts)

CREATE TABLE vuln_surfaces (
  surface_id BIGSERIAL PRIMARY KEY,
  ecosystem TEXT NOT NULL,               -- nuget
  package TEXT NOT NULL,
  cve_id TEXT NOT NULL,
  vuln_version TEXT NOT NULL,            -- a representative vulnerable version
  fixed_version TEXT NOT NULL,
  surface_digest TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(ecosystem, package, cve_id, vuln_version, fixed_version)
);

CREATE TABLE vuln_surface_sinks (
  surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
  sink_method_key TEXT NOT NULL,
  reason TEXT NOT NULL,                  -- changed|added|removed|heuristic
  PRIMARY KEY(surface_id, sink_method_key)
);

CREATE TABLE vuln_surface_triggers (
  surface_id BIGINT REFERENCES vuln_surfaces(surface_id) ON DELETE CASCADE,
  trigger_method_key TEXT NOT NULL,
  sink_method_key TEXT NOT NULL,
  internal_path JSONB,                   -- optional: library internal witness path
  PRIMARY KEY(surface_id, trigger_method_key, sink_method_key)
);

2.3 Reachability cache & witnesses

CREATE TABLE reach_findings (
  finding_id BIGSERIAL PRIMARY KEY,
  snapshot_id BIGINT REFERENCES cg_snapshots(snapshot_id) ON DELETE CASCADE,
  cve_id TEXT NOT NULL,
  ecosystem TEXT NOT NULL,
  package TEXT NOT NULL,
  package_version TEXT NOT NULL,
  reachable BOOLEAN NOT NULL,
  reachable_entrypoints INT NOT NULL DEFAULT 0,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(snapshot_id, cve_id, package, package_version)
);

CREATE TABLE reach_witnesses (
  witness_id BIGSERIAL PRIMARY KEY,
  finding_id BIGINT REFERENCES reach_findings(finding_id) ON DELETE CASCADE,
  entry_node_id BIGINT REFERENCES cg_nodes(node_id),
  dsse_envelope JSONB NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

3) Stable identity: MethodKey + IL hash

3.1 MethodKey (must be stable across builds)

Use a normalized string like:

{AssemblyName}|{DeclaringTypeFullName}|{MethodName}`{GenericArity}({ParamType1},{ParamType2},...)

Examples:

MyApp|BillingController|Pay(System.String)
LibXYZ|LibXYZ.Parser|Parse(System.ReadOnlySpan<System.Byte>)

3.2 Normalized IL hash (for smart-diff + incremental graph updates)

Raw IL bytes aren’t stable (metadata tokens change). Normalize:

opcode names
branch targets by instruction index, not offset
method operands by resolved MethodKey
string operands by literal or hashed literal
type operands by full name

Then hash SHA256(normalized_bytes).

4) Call graph extraction for .NET (concrete, doable)

Tooling choice

Start with Mono.Cecil (MIT license, easy IL traversal). You can later swap to System.Reflection.Metadata for speed.

4.1 Build process (Scanner.Worker)

dotnet restore (use your locked restore)
dotnet build -c Release /p:DebugType=portable /p:DebugSymbols=true
Collect:
- app assemblies: bin/Release/**/publish/*.dll or build output
- .pdb files for sequence points (file/line for witnesses)

4.2 Cecil loader

var rp = new ReaderParameters {
    ReadSymbols = true,
    SymbolReaderProvider = new PortablePdbReaderProvider()
};

var asm = AssemblyDefinition.ReadAssembly(dllPath, rp);

4.3 Node extraction (methods)

Walk all types, including nested:

IEnumerable<TypeDefinition> AllTypes(ModuleDefinition m)
{
    var stack = new Stack<TypeDefinition>(m.Types);
    while (stack.Count > 0)
    {
        var t = stack.Pop();
        yield return t;
        foreach (var nt in t.NestedTypes) stack.Push(nt);
    }
}

foreach (var type in AllTypes(asm.MainModule))
foreach (var method in type.Methods)
{
    var key = MethodKey.From(method);           // your normalizer
    var (file, line) = PdbFirstSequencePoint(method);
    var ilHash = method.HasBody ? ILFingerprint(method) : null;

    // store node (method_key, file, line, il_hash, flags...)
}

4.4 Edge extraction (direct calls)

foreach (var method in type.Methods.Where(m => m.HasBody))
{
    var srcKey = MethodKey.From(method);
    foreach (var ins in method.Body.Instructions)
    {
        if (ins.Operand is MethodReference mr)
        {
            if (ins.OpCode.Code is Code.Call or Code.Callvirt or Code.Newobj)
            {
                var dstKey = MethodKey.From(mr); // important: stable even if not resolved
                edges.Add(new Edge(srcKey, dstKey, kind: CallKind.Direct));
            }
            if (ins.OpCode.Code is Code.Ldftn or Code.Ldvirtftn)
            {
                // delegate capture (handle later)
            }
        }
    }
}

5) Advanced precision: dynamic dispatch + DI + async/await

If you stop at direct edges only, you’ll miss many real paths.

5.1 Async/await mapping (critical for readable witnesses)

Async methods compile into a state machine MoveNext(). You want edges attributed back to the original method.

In Cecil:

Check AsyncStateMachineAttribute on a method
It references a state machine type
Find that type’s MoveNext method
Map MoveNextKey -> OriginalMethodKey

Then, while extracting edges:

srcKey = MoveNextToOriginal.TryGetValue(srcKey, out var original) ? original : srcKey;

Do the same for iterator state machines.

5.2 Virtual/interface dispatch (CHA/RTA)

You need 2 maps:

type hierarchy / interface impl map
override map from “declared method” → “implementation method(s)”

Build override map

// For each method, Cecil exposes method.Overrides for explicit implementations.
overrideMap[MethodKey.From(overrideRef)] = MethodKey.From(methodDef);

CHA: for callvirt to virtual method T.M, add edges to overrides in derived classes RTA: restrict to derived classes that are actually instantiated.

How to get instantiated types:

look for newobj instructions and add the created type to InstantiatedTypes
plus DI registrations (below)

5.3 DI hints (Microsoft.Extensions.DependencyInjection)

You will see calls like:

ServiceCollectionServiceExtensions.AddTransient<TService, TImpl>(...)

In IL these are generic method calls. Detect and record TService -> TImpl as “instantiated”. This massively improves RTA for modern .NET apps.

5.4 Delegates/lambdas (good enough approach)

Implement intraprocedural tracking:

when you see ldftn SomeMethod then newobj Action::.ctor then stloc.s X
store delegateTargets[local X] += SomeMethod
when you see ldloc.s X and later callvirt Invoke, add edges to targets

This makes Minimal API entrypoint discovery work too.

5.5 Reflection (best-effort)

Implement only high-signal heuristics:

typeof(T).GetMethod("Foo") with constant "Foo"
GetType().GetMethod("Foo") with constant "Foo" (type unknown → mark uncertain)

If resolved, add edge with kind=reflection_guess. If not, set node flag has_reflection = true and in results show “may be incomplete”.

6) Entrypoint detection (concrete detectors)

6.1 MVC controllers

Detect:

types deriving from Microsoft.AspNetCore.Mvc.ControllerBase
methods:
- public
- not [NonAction]
- has [HttpGet], [HttpPost], [Route] etc.

Extract route template from attributes’ ctor arguments.

Store in cg_entrypoints:

kind = http
name = GET /billing/pay (compose verb+template)

6.2 Minimal APIs

Scan Program.Main IL:

find calls to MapGet, MapPost, ...
extract route string from preceding ldstr
resolve handler method via delegate tracking (ldftn)

Entry:

kind = http
name = GET /foo

6.3 CLI

Find assembly entry point method (asm.EntryPoint) or static Main. Entry:

kind = cli
name = Main

Start here. Add gRPC/jobs later.

7) Smart-Diff SurfaceBuilder (the “advanced” part)

This is what makes your reachability actually meaningful for CVEs.

7.1 SurfaceBuilder inputs

From your vuln ingestion pipeline:

ecosystem = nuget
package = LibXYZ
affected range = <= 1.2.3
fixed version = 1.2.4
CVE id

7.2 Choose a vulnerable version to diff

Pick the highest affected version below fixed.

fixed = 1.2.4
vulnerable representative = 1.2.3

(If multiple fixed versions exist, build multiple surfaces.)

7.3 Download both packages

Use NuGet.Protocol to download .nupkg, unzip, pick TFMs you care about (often netstandard2.0 is safest). Compute fingerprints for each assembly.

7.4 Compute method fingerprints

For each method:

MethodKey
Normalized IL hash

7.5 Diff

ChangedMethods = { k | hashVuln[k] != hashFixed[k] } ∪ added ∪ removed

Store these as vuln_surface_sinks with reason.

7.6 Build internal library call graph

Same Cecil extraction, but only for package assemblies. Now compute triggers:

Reverse BFS from sinks:

Start from all sink method keys
Walk predecessors
When you encounter a public/exported method, record it as a trigger

Also store one internal path for each trigger → sink (for witnesses).

7.7 Add interface/base declarations as triggers

Important: your app might call a library via an interface method signature, not the concrete implementation.

For each trigger implementation method:

for each method.Overrides entry, add the overridden method key as an additional trigger

This reduces dependence on perfect dispatch expansion during app scanning.

7.8 Persist the surface

Store:

sinks set
triggers set
internal witness paths (optional but highly valuable)

Now you’ve converted a “version range” CVE into “these specific library APIs are dangerous”.

8) Reachability engine (fast, witness-producing)

8.1 In-memory graph format (CSR)

Don’t BFS off dictionaries; you’ll die on perf.

Build integer indices:

method_key -> nodeIndex (0..N-1)
store arrays:
- predOffsets[N+1]
- preds[edgeCount]

Construction:

count predecessors per node
prefix sum to offsets
fill preds

8.2 Reverse BFS from sinks

This computes:

visited[node] = can reach a sink
parent[node] = next node toward a sink (for path reconstruction)

public sealed class ReachabilityEngine
{
    public ReachabilityResult Compute(
        Graph g,
        ReadOnlySpan<int> entrypoints,
        ReadOnlySpan<int> sinks)
    {
        var visitedMark = g.VisitMark;      // int[] length N (reused across runs)
        var parent = g.Parent;              // int[] length N (reused)
        g.RunId++;

        var q = new IntQueue(capacity: g.NodeCount);
        var sinkSet = new BitSet(g.NodeCount);
        foreach (var s in sinks)
        {
            sinkSet.Set(s);
            visitedMark[s] = g.RunId;
            parent[s] = s;
            q.Enqueue(s);
        }

        while (q.TryDequeue(out var v))
        {
            var start = g.PredOffsets[v];
            var end = g.PredOffsets[v + 1];
            for (int i = start; i < end; i++)
            {
                var p = g.Preds[i];
                if (visitedMark[p] == g.RunId) continue;
                visitedMark[p] = g.RunId;
                parent[p] = v;
                q.Enqueue(p);
            }
        }

        // Collect reachable entrypoints and paths
        var results = new List<EntryWitness>();
        foreach (var e in entrypoints)
        {
            if (visitedMark[e] != g.RunId) continue;
            var path = ReconstructPath(e, parent, sinkSet);
            results.Add(new EntryWitness(e, path));
        }

        return new ReachabilityResult(results);
    }

    private static int[] ReconstructPath(int entry, int[] parent, BitSet sinks)
    {
        var path = new List<int>(32);
        int cur = entry;
        path.Add(cur);

        // follow parent pointers until a sink
        for (int guard = 0; guard < 10_000; guard++)
        {
            if (sinks.Get(cur)) break;
            var nxt = parent[cur];
            if (nxt == cur || nxt < 0) break; // safety
            cur = nxt;
            path.Add(cur);
        }
        return path.ToArray();
    }
}

8.3 Producing the witness

For each node index in the path:

method_key
file_path / line_start (if known)
optional flags (reflection_guess edge, dispatch edge)

Then attach:

vuln id, package, version
entrypoint kind/name
graph digest + config digest
surface digest
timestamp

Send JSON to Attestor for DSSE signing, store envelope in Authority.

9) Scaling: don’t do BFS 500 times if you can avoid it

9.1 First-line scaling (usually enough)

Group vulnerabilities by package/version → surfaces reused
Only run reachability for vulns where:
- dependency present AND
- surface exists OR fallback mode
Limit witnesses per vuln (top 3)

In practice, with N~~50k nodes and E~~200k edges, a reverse BFS is fast in C# if done with arrays.

9.2 Incremental Smart-Diff × Reachability (your “low noise” killer feature)

Step A: compute graph delta between snapshots

Use il_hash per method to detect changed nodes:

added / removed / changed nodes
edges updated only for changed nodes

Step B: decide which vulnerabilities need recompute

Store a cached reverse-reachable set per vuln surface if you want (bitset), OR just do a cheaper heuristic:

Recompute for vulnerability if:

sink set changed (new surface or version changed), OR
any changed node is on any previously stored witness path, OR
entrypoints changed, OR
impacted nodes touch any trigger node’s predecessors (use a small localized search)

A practical approach:

store all node IDs that appear in any witness path for that vuln
if delta touches any of those nodes/edges, recompute
otherwise reuse cached result

This yields a massive win on PR scans where most code is unchanged.

Step C: “Impact frontier” recompute (optional)

If you want more advanced:

compute ImpactSet = ΔNodes ∪ endpoints(ΔEdges)
run reverse BFS starting from ImpactSet ∩ ReverseReachSet and update visited marks This is trickier to implement correctly (dynamic graph), so I’d ship the heuristic first.

10) Practical fallback modes (don’t block shipping)

You won’t have surfaces for every CVE on day 1. Handle this gracefully:

Mode 1: Surface-based reachability (best)

sink = trigger methods from surface
result: “reachable” with path

Mode 2: Package API usage (good fallback)

sink = any method in that package that is called by app
result: “package reachable” (lower confidence), still provide path to callsite

Mode 3: Dependency present only (SBOM level)

no call graph needed
result: “present” only

Your UI can show confidence tiers:

Confirmed reachable (surface)
Likely reachable (package API)
Present only (SBOM)

11) Integration points inside Stella Ops

Scanner.Worker (per build)

Build/collect assemblies + pdb
CallGraphBuilder → nodes/edges/entrypoints + graph_digest
Load SBOM vulnerabilities list
For each vuln:
- resolve surface triggers; if missing → enqueue SurfaceBuilder job + fallback mode
- run reachability BFS
- for each reachable entrypoint: emit DSSE witness
Persist findings/witnesses

SurfaceBuilder (async worker)

triggered by “surface missing” events or nightly preload of top packages
computes surface once, stores forever

Authority

stores graphs, surfaces, findings, witnesses
provides retrieval APIs for UI/CI

12) What to implement first (in the order that produces value fastest)

Week 1–2 scope (realistic, shippable)

Cecil call graph extraction (direct calls)
MVC + Minimal API entrypoints
Reverse BFS reachability with path witnesses
DSSE witness signing + storage
SurfaceBuilder v1:
- IL hash per method
- changed methods as sinks
- triggers via internal reverse BFS
UI: “Show Witness” + “Verify Signature”

Next increment (precision upgrades)

async/await mapping to original methods
RTA + DI registration hints
delegate tracking for Minimal API handlers (if not already)
interface override triggers in surface builder

Later (if you want “attackability”, not just “reachability”)

taint/dataflow for top sink classes (deserialization, path traversal, SQL, command exec)
sanitizer modeling & parameter constraints

13) Common failure modes and how to harden

MethodKey mismatches (surface vs app call)

Ensure both are generated from the same normalization rules
For generic methods, prefer definition keys (strip instantiation)
Store both “exact” and “erased generic” variants if needed

Multi-target frameworks

SurfaceBuilder: compute triggers for each TFM, union them
App scan: choose TFM closest to build RID, but allow fallback to union

Huge graphs

Drop System.* nodes/edges unless:
- the vuln is in System.* (rare, but handle separately)
Deduplicate nodes by MethodKey across assemblies where safe
Use CSR arrays + pooled queues

Reflection heavy projects

Mark analysis confidence lower
Include “unknown edges present” in finding metadata
Still produce a witness path up to the reflective callsite

If you want, I can also paste a complete Cecil-based CallGraphBuilder class (nodes+edges+PDB lines), plus the SurfaceBuilder that downloads NuGet packages and generates vuln_surface_triggers end-to-end.

28 KiB Raw Blame History Unescape Escape