Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies. - Documented roles and guidelines in AGENTS.md for Scheduler module. - Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs. - Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics. - Developed API endpoints for managing resolver jobs and retrieving metrics. - Defined models for resolver job requests and responses. - Integrated dependency injection for resolver job services. - Implemented ImpactIndexSnapshot for persisting impact index data. - Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring. - Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService. - Created dotnet-filter.sh script to handle command-line arguments for dotnet. - Established nuget-prime project for managing package downloads.
4.7 KiB
4.7 KiB
Here’s a compact, plain‑English plan to make your scanner faster, quieter, and auditor‑friendly by (1) diff‑aware rescans and (2) unified binary+source reachability—both drop‑in for Stella Ops.
Deterministic, diff‑aware rescans (clean SBOM/VEX diffs)
Goal: Only recompute what changed; emit stable, minimal diffs reviewers can trust.
Core ideas
- Per‑layer SBOM artifacts (cacheable): For each image layer
L#, persist:sbom-L#.cdx.json(CycloneDX),hash(L#),toolchain-hash,feeds-hash.- Symbol‑fingerprints for each discovered file:
algo|path|size|mtime|xxh3|funcIDs[].
- Slice recomputation: On new image
I', match layers via hashes; for changed layers or files, recompute only their call‑graph slices and vuln joins. - Deterministic manifests: Every scan writes a
scan.lock.json(inputs, feed versions, rules, lattice policy hash, tool versions, clocks) so results are replayable.
Minimal data model (Mongo)
scan_runs(_id, imageDigest, inputsHash, policyHash, feedsHash, startedAt, finishedAt, parentRunId?)layer_sboms(scanRunId, layerDigest, sbomCid, symbolIndexCid, layerHash)file_symbols(scanRunId, path, fileHash, funcIDs[], lang, size, mtime)diffs(fromRunId, toRunId, kind: 'sbom'|'vex'|'reachability', stats, patch)(store JSON Patch)
Algorithm sketch
- Resolve base image ancestry → map
old layer digest ↔ new layer digest. - For unchanged layers: reuse
layer_sboms+file_symbols. - For changed/added files: re‑symbolize + re‑analyze; restrict call‑graph build to impacted SCCs.
- Re‑join OSV/GHSA/vendor vulns → compute reachability deltas → emit stable JSON Patch.
CLI impact
stella scan --deterministic --cache-dir ~/.stella/cache --emit-diff previousRunIdstella diff --from <runA> --to <runB> --format jsonpatch|md
Unified binary + source reachability (function‑level)
Goal: Decide “is the vulnerable function reachable/used here?” across native and managed code.
Extraction
- Binary symbolizers:
- ELF: parse
.symtab/.dynsym, DWARF (if present). - Mach‑O/PE: export tables + DWARF/PDB (if present).
- Build Canonical Symbol ID (CSID):
lang:pkg@ver!binary#file:function(signature); normalize C++/Rust mangling.
- ELF: parse
- Source symbolizers:
- .NET (Roslyn+IL), JVM (bytecode), Go (SSA), Node/TS (TS AST), Python (AST), Rust (HIR/MIR if available).
- Bindings join: Map FFI edges (P/Invoke, cgo, JNI/JNA, N-API) → cross‑ecosystem call edges:
.NET P/Invoke→ DLL export CSID.- Java JNI →
Java_com_pkg_Class_Method↔ native export. - Node N-API → addon exports ↔ JS require() site.
Reachability pipeline
- Build per‑language call graphs (CG) with framework models (ASP.NET, Spring, Express, etc.).
- Add FFI edges; merge into a polyglot call graph.
- Mark entrypoints (container
CMD/ENTRYPOINT, web handlers, cron, CLI verbs). - For each CVE → {pkg, version, affected symbols[]} map → is any affected CSID on a path from an entrypoint?
- Output evidence:
reachable: true|false|unknown- shortest path (symbols list)
- probes (optional): runtime samples (EventPipe/JFR/uprobes) hitting CSIDs
Artifacts emitted
symbols.csi.jsonl(all CSIDs)polyglot.cg.slices.json(only impacted SCCs for diffs)reach.vex.json(OpenVEX/CSAF with function‑level notes + confidence)
What to build next (low‑risk, high‑impact)
- [Week 1–2] Per‑layer caches +
scan.lock.json; file symbol‑fingerprints (xxh3 + top‑K funcIDs). - [Week 3–4] ELF/PE/Mach‑O symbolizer lib with CSIDs; .NET IL + P/Invoke mapper.
- [Week 5–6] Polyglot CG merge + entrypoint discovery from Docker metadata; JSON Patch diffs.
- [Week 7+] Runtime probes (opt‑in) to boost confidence and suppress false positives.
Tiny code seeds (C# hints)
Symbol fingerprint (per file)
record SymbolFingerprint(
string Algo, string Path, long Size, long MTimeUnix,
string ContentHash, string[] FuncIds);
Deterministic scan lock
record ScanLock(
string FeedsHash, string RulesHash, string PolicyHash, string Toolchain,
string ImageDigest, string[] LayerDigests, DateTimeOffset Clock,
IDictionary<string,string> EnvPins);
JSON Patch diff emit
var patch = JsonDiffPatch.Diff(oldVexJson, newVexJson); // stable sort keys beforehand
File.WriteAllText("vex.diff.json", patch);
If you want, I can turn this into:
- a .proto for the cache/index objects,
- a Mongo schema + indexes (including compound keys for fast layer reuse),
- and a .NET 10 service skeleton (
StellaOps.Scanner.WebService) with endpoints:/scan,/diff/{from}/{to},/reach/{runId}.