Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

feat: Add initial implementation of Vulnerability Resolver Jobs

- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies.
- Documented roles and guidelines in AGENTS.md for Scheduler module.
- Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs.
- Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics.
- Developed API endpoints for managing resolver jobs and retrieving metrics.
- Defined models for resolver job requests and responses.
- Integrated dependency injection for resolver job services.
- Implemented ImpactIndexSnapshot for persisting impact index data.
- Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring.
- Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService.
- Created dotnet-filter.sh script to handle command-line arguments for dotnet.
- Established nuget-prime project for managing package downloads.

2025-11-18 07:52:15 +02:00

4.7 KiB

Raw Blame History

Here’s a compact, plain‑English plan to make your scanner faster, quieter, and auditor‑friendly by (1) diff‑aware rescans and (2) unified binary+source reachability—both drop‑in for Stella Ops.

Deterministic, diff‑aware rescans (clean SBOM/VEX diffs)

Goal: Only recompute what changed; emit stable, minimal diffs reviewers can trust.

Core ideas

Per‑layer SBOM artifacts (cacheable): For each image layer L#, persist:
- sbom-L#.cdx.json (CycloneDX), hash(L#), toolchain-hash, feeds-hash.
- Symbol‑fingerprints for each discovered file: algo|path|size|mtime|xxh3|funcIDs[].
Slice recomputation: On new image I', match layers via hashes; for changed layers or files, recompute only their call‑graph slices and vuln joins.
Deterministic manifests: Every scan writes a scan.lock.json (inputs, feed versions, rules, lattice policy hash, tool versions, clocks) so results are replayable.

Minimal data model (Mongo)

scan_runs(_id, imageDigest, inputsHash, policyHash, feedsHash, startedAt, finishedAt, parentRunId?)
layer_sboms(scanRunId, layerDigest, sbomCid, symbolIndexCid, layerHash)
file_symbols(scanRunId, path, fileHash, funcIDs[], lang, size, mtime)
diffs(fromRunId, toRunId, kind: 'sbom'|'vex'|'reachability', stats, patch) (store JSON Patch)

Algorithm sketch

Resolve base image ancestry → map old layer digest ↔ new layer digest.
For unchanged layers: reuse layer_sboms + file_symbols.
For changed/added files: re‑symbolize + re‑analyze; restrict call‑graph build to impacted SCCs.
Re‑join OSV/GHSA/vendor vulns → compute reachability deltas → emit stable JSON Patch.

CLI impact

stella scan --deterministic --cache-dir ~/.stella/cache --emit-diff previousRunId
stella diff --from <runA> --to <runB> --format jsonpatch|md

Unified binary + source reachability (function‑level)

Goal: Decide “is the vulnerable function reachable/used here?” across native and managed code.

Extraction

Binary symbolizers:
- ELF: parse .symtab/.dynsym, DWARF (if present).
- Mach‑O/PE: export tables + DWARF/PDB (if present).
- Build Canonical Symbol ID (CSID): lang:pkg@ver!binary#file:function(signature); normalize C++/Rust mangling.
Source symbolizers:
- .NET (Roslyn+IL), JVM (bytecode), Go (SSA), Node/TS (TS AST), Python (AST), Rust (HIR/MIR if available).
Bindings join: Map FFI edges (P/Invoke, cgo, JNI/JNA, N-API) → cross‑ecosystem call edges:
- .NET P/Invoke → DLL export CSID.
- Java JNI → Java_com_pkg_Class_Method ↔ native export.
- Node N-API → addon exports ↔ JS require() site.

Reachability pipeline

Build per‑language call graphs (CG) with framework models (ASP.NET, Spring, Express, etc.).
Add FFI edges; merge into a polyglot call graph.
Mark entrypoints (container CMD/ENTRYPOINT, web handlers, cron, CLI verbs).
For each CVE → {pkg, version, affected symbols[]} map → is any affected CSID on a path from an entrypoint?
Output evidence:
- reachable: true|false|unknown
- shortest path (symbols list)
- probes (optional): runtime samples (EventPipe/JFR/uprobes) hitting CSIDs

Artifacts emitted

symbols.csi.jsonl (all CSIDs)
polyglot.cg.slices.json (only impacted SCCs for diffs)
reach.vex.json (OpenVEX/CSAF with function‑level notes + confidence)

What to build next (low‑risk, high‑impact)

[Week 1–2] Per‑layer caches + scan.lock.json; file symbol‑fingerprints (xxh3 + top‑K funcIDs).
[Week 3–4] ELF/PE/Mach‑O symbolizer lib with CSIDs; .NET IL + P/Invoke mapper.
[Week 5–6] Polyglot CG merge + entrypoint discovery from Docker metadata; JSON Patch diffs.
[Week 7+] Runtime probes (opt‑in) to boost confidence and suppress false positives.

Tiny code seeds (C# hints)

Symbol fingerprint (per file)

record SymbolFingerprint(
  string Algo, string Path, long Size, long MTimeUnix,
  string ContentHash, string[] FuncIds);

Deterministic scan lock

record ScanLock(
  string FeedsHash, string RulesHash, string PolicyHash, string Toolchain,
  string ImageDigest, string[] LayerDigests, DateTimeOffset Clock,
  IDictionary<string,string> EnvPins);

JSON Patch diff emit

var patch = JsonDiffPatch.Diff(oldVexJson, newVexJson); // stable sort keys beforehand
File.WriteAllText("vex.diff.json", patch);

If you want, I can turn this into:

a .proto for the cache/index objects,
a Mongo schema + indexes (including compound keys for fast layer reuse),
and a .NET 10 service skeleton (StellaOps.Scanner.WebService) with endpoints: /scan, /diff/{from}/{to}, /reach/{runId}.

4.7 KiB Raw Blame History Unescape Escape

Deterministic, diff‑aware rescans (clean SBOM/VEX diffs)

Unified binary + source reachability (function‑level)

What to build next (low‑risk, high‑impact)

Tiny code seeds (C# hints)

4.7 KiB

Raw Blame History