Files
git.stella-ops.org/docs/product-advisories/archived/15-Nov-2026 - scanner roadmap with deterministic diff-aware rescans.md
master 8355e2ff75
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add initial implementation of Vulnerability Resolver Jobs
- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies.
- Documented roles and guidelines in AGENTS.md for Scheduler module.
- Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs.
- Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics.
- Developed API endpoints for managing resolver jobs and retrieving metrics.
- Defined models for resolver job requests and responses.
- Integrated dependency injection for resolver job services.
- Implemented ImpactIndexSnapshot for persisting impact index data.
- Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring.
- Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService.
- Created dotnet-filter.sh script to handle command-line arguments for dotnet.
- Established nuget-prime project for managing package downloads.
2025-11-18 07:52:15 +02:00

4.7 KiB
Raw Blame History

Heres a compact, plainEnglish plan to make your scanner faster, quieter, and auditorfriendly by (1) diffaware rescans and (2) unified binary+source reachability—both dropin for StellaOps.

Deterministic, diffaware rescans (clean SBOM/VEX diffs)

Goal: Only recompute what changed; emit stable, minimal diffs reviewers can trust.

Core ideas

  • Perlayer SBOM artifacts (cacheable): For each image layer L#, persist:
    • sbom-L#.cdx.json (CycloneDX), hash(L#), toolchain-hash, feeds-hash.
    • Symbolfingerprints for each discovered file: algo|path|size|mtime|xxh3|funcIDs[].
  • Slice recomputation: On new image I', match layers via hashes; for changed layers or files, recompute only their callgraph slices and vuln joins.
  • Deterministic manifests: Every scan writes a scan.lock.json (inputs, feed versions, rules, lattice policy hash, tool versions, clocks) so results are replayable.

Minimal data model (Mongo)

  • scan_runs(_id, imageDigest, inputsHash, policyHash, feedsHash, startedAt, finishedAt, parentRunId?)
  • layer_sboms(scanRunId, layerDigest, sbomCid, symbolIndexCid, layerHash)
  • file_symbols(scanRunId, path, fileHash, funcIDs[], lang, size, mtime)
  • diffs(fromRunId, toRunId, kind: 'sbom'|'vex'|'reachability', stats, patch) (store JSON Patch)

Algorithm sketch

  1. Resolve base image ancestry → map old layer digest ↔ new layer digest.
  2. For unchanged layers: reuse layer_sboms + file_symbols.
  3. For changed/added files: resymbolize + reanalyze; restrict callgraph build to impacted SCCs.
  4. Rejoin OSV/GHSA/vendor vulns → compute reachability deltas → emit stable JSON Patch.

CLI impact

  • stella scan --deterministic --cache-dir ~/.stella/cache --emit-diff previousRunId
  • stella diff --from <runA> --to <runB> --format jsonpatch|md

Unified binary + source reachability (functionlevel)

Goal: Decide “is the vulnerable function reachable/used here?” across native and managed code.

Extraction

  • Binary symbolizers:
    • ELF: parse .symtab/.dynsym, DWARF (if present).
    • MachO/PE: export tables + DWARF/PDB (if present).
    • Build Canonical Symbol ID (CSID): lang:pkg@ver!binary#file:function(signature); normalize C++/Rust mangling.
  • Source symbolizers:
    • .NET (Roslyn+IL), JVM (bytecode), Go (SSA), Node/TS (TS AST), Python (AST), Rust (HIR/MIR if available).
  • Bindings join: Map FFI edges (P/Invoke, cgo, JNI/JNA, N-API) → crossecosystem call edges:
    • .NET P/Invoke → DLL export CSID.
    • Java JNI → Java_com_pkg_Class_Method ↔ native export.
    • Node N-API → addon exports ↔ JS require() site.

Reachability pipeline

  1. Build perlanguage call graphs (CG) with framework models (ASP.NET, Spring, Express, etc.).
  2. Add FFI edges; merge into a polyglot call graph.
  3. Mark entrypoints (container CMD/ENTRYPOINT, web handlers, cron, CLI verbs).
  4. For each CVE → {pkg, version, affected symbols[]} map → is any affected CSID on a path from an entrypoint?
  5. Output evidence:
    • reachable: true|false|unknown
    • shortest path (symbols list)
    • probes (optional): runtime samples (EventPipe/JFR/uprobes) hitting CSIDs

Artifacts emitted

  • symbols.csi.jsonl (all CSIDs)
  • polyglot.cg.slices.json (only impacted SCCs for diffs)
  • reach.vex.json (OpenVEX/CSAF with functionlevel notes + confidence)

What to build next (lowrisk, highimpact)

  • [Week 12] Perlayer caches + scan.lock.json; file symbolfingerprints (xxh3 + topK funcIDs).
  • [Week 34] ELF/PE/MachO symbolizer lib with CSIDs; .NET IL + P/Invoke mapper.
  • [Week 56] Polyglot CG merge + entrypoint discovery from Docker metadata; JSON Patch diffs.
  • [Week 7+] Runtime probes (optin) to boost confidence and suppress false positives.

Tiny code seeds (C# hints)

Symbol fingerprint (per file)

record SymbolFingerprint(
  string Algo, string Path, long Size, long MTimeUnix,
  string ContentHash, string[] FuncIds);

Deterministic scan lock

record ScanLock(
  string FeedsHash, string RulesHash, string PolicyHash, string Toolchain,
  string ImageDigest, string[] LayerDigests, DateTimeOffset Clock,
  IDictionary<string,string> EnvPins);

JSON Patch diff emit

var patch = JsonDiffPatch.Diff(oldVexJson, newVexJson); // stable sort keys beforehand
File.WriteAllText("vex.diff.json", patch);

If you want, I can turn this into:

  • a .proto for the cache/index objects,
  • a Mongo schema + indexes (including compound keys for fast layer reuse),
  • and a .NET 10 service skeleton (StellaOps.Scanner.WebService) with endpoints: /scan, /diff/{from}/{to}, /reach/{runId}.