- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs. - Defined a minimal data model including nodes, edges, and SBOM components. - Outlined a step-by-step guide for building the reachability graph in a C#-centric manner. - Established core domain models, including enumerations for binary formats and symbol kinds. - Created a public API for the binary reachability service, including methods for graph building and serialization. - Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats. - Enhanced symbol normalization and digesting processes to ensure deterministic signatures. - Included error handling, logging, and a high-level test plan to ensure robustness and correctness. - Added non-functional requirements to guide performance, memory usage, and thread safety.
29 KiB
Here’s a crisp idea that could give Stella Ops a real moat: binary‑level reachability—linking CVEs directly to the exact functions and offsets inside compiled artifacts (ELF/PE/Mach‑O), not just to packages.
Why this matters (quick background)
- Package‑level flags are noisy. Most scanners say “vuln in
libX v1.2,” but that library might be present and never executed. - Language‑level call graphs help (when you have source or rich metadata), but containers often ship only stripped binaries.
- Binary reachability answers: Is the vulnerable function actually in this image? Is its code path reachable from the entrypoints we observed or can construct?
The missing layer: Symbolization
Build a symbolization layer that normalizes debug and symbol info across platforms:
- Inputs: DWARF (ELF/Mach‑O), PDB (PE/Windows), symtabs, exported symbols,
.eh_frame, and (when stripped) heuristic signatures (e.g., function byte‑hashes, CFG fingerprints). - Outputs: a source‑agnostic map:
{binary → sections → functions → (addresses, ranges, hashes, demangled names, inlined frames)}. - Normalization: Put everything into a common schema (e.g.,
Stella.Symbolix.v1) so higher layers don’t care if it came from DWARF or PDB.
End‑to‑end reachability (binary‑first, source‑agnostic)
-
Acquire & parse
- Detect format (ELF/PE/Mach‑O), parse headers, sections, symbol tables.
- If debug info present: parse DWARF/PDB; else fall back to disassembly + function boundary recovery.
-
Function catalog
- Assign stable IDs per function:
(imageHash, textSectionHash, startVA, size, fnHashXX). - Record x‑refs (calls/jumps), imports/exports, PLT/IAT edges.
- Assign stable IDs per function:
-
Entrypoint discovery
- Docker entry, process launch args, service scripts; infer likely mains (Go
main.main, .NET hostfxr path, JVM launcher, etc.).
- Docker entry, process launch args, service scripts; infer likely mains (Go
-
Call‑graph build (binary CFG)
- Build inter/intra‑procedural graph (direct + resolved indirect via IAT/PLT). Keep “unknown‑target” edges for conservative safety.
-
CVE→function linking
- Maintain a signature bank per CVE advisory: vulnerable function names, file paths, and—crucially—byte‑sequence or basic‑block fingerprints for patched vs vulnerable versions (works even when stripped).
-
Reachability analysis
- Is the vulnerable function present? Is there a path from any entrypoint to it (under conservative assumptions)? Tag as
Present+Reachable,Present+Uncertain, orAbsent.
- Is the vulnerable function present? Is there a path from any entrypoint to it (under conservative assumptions)? Tag as
-
Runtime confirmation (optional, when users allow)
- Lightweight probes (eBPF on Linux, ETW on Windows, perf/JFR/EventPipe) capture function hits; cross‑check with the static result to upgrade confidence.
Minimal component plan (drop into Stella Ops)
- Scanner.Symbolizer
Parsers: ELF/DWARF (libdw or pure‑managed reader), PE/PDB (Dia/LLVM PDB), Mach‑O/DSYM.
Output:
Symbolix.v1blobs stored in OCI layer cache. - Scanner.CFG Lifts functions to a normalized IR (capstone/iced‑x86 for decode) → builds CFG & call graph.
- Advisory.FingerprintBank Ingests CSAF/OpenVEX plus curated fingerprints (fn names, block hashes, patch diff markers). Versioned, signed, air‑gap‑syncable.
- Reachability.Engine
Joins (
Symbolix+CFG+FingerprintBank) → emitsReachabilityEvidencewith lattice states for VEX. - VEXer.Adapter
Emits OpenVEX statements with
status: affected/not_affectedandjustification: function_not_present | function_not_reachable | mitigated_at_runtime, attaching Evidence URIs. - Console UX “Why not affected?” panel showing entrypoint→…→function path (or absence), with byte‑hash proof.
Data model sketch (concise)
ImageFunction { id, name?, startVA, size, fnHash, sectionHash, demangled?, provenance:{DWARF|PDB|Heuristic} }Edge { srcFnId, dstFnId, kind:{direct|plt|iat|indirect?} }CveSignature { cveId, fnName?, libHints[], blockFingerprints[], versionRanges }Evidence { cveId, imageId, functionMatches[], reachable: bool?, confidence:[low|med|high], method:[static|runtime|hybrid] }
Practical phases (8–10 weeks of focused work)
- P0: ELF/DWARF symbolizer + basic function catalog; link a handful of CVEs via name‑only; emit OpenVEX
function_not_present. - P1: CFG builder (direct calls) + PLT/IAT resolution; simple reachability; first fingerprints for top 50 CVEs in glibc, openssl, curl, zlib.
- P2: Stripped‑binary heuristics (block hashing) + Go/Rust name demangling; Windows PDB ingestion for PE.
- P3: Runtime probes (opt‑in) + confidence upgrade logic; Console path explorer; evidence signing (DSSE).
KPIs to prove the moat
- Noise cut: % reduction in “affected” flags after reachability (target 40–70% on typical containers).
- Precision: Ground‑truth validation vs PoC images (TP/FP/FN on presence & reachability).
- Coverage: % images where we can make a determination without source (goal: >80%).
- Latency: Added scan time per image (<15s typical with caches).
Risks & how to handle them
- Stripped binaries → mitigate with block‑hash fingerprints & library‑version heuristics.
- Obfuscated/packed code → mark
Uncertain; allow user‑supplied hints; prefer runtime confirmation. - Advisory inconsistency → keep our own curated CVE→function fingerprint bank; sign & version it.
- Platform spread → start Linux/ELF, then Windows/PDB, then Mach‑O.
Why competitors struggle
Most tools stop at packages because binary CFG + fingerprint curation is hard and expensive. Shipping a source‑agnostic reachability engine tied to signed evidence in VEX would set Stella Ops apart—especially in offline/air‑gapped and sovereign contexts you already target.
If you want, I can draft:
- the
Symbolix.v1protobuf, - a tiny PoC (ELF→functions→match CVE with a block fingerprint),
- and the OpenVEX emission snippet your VEXer can produce. Below is a detailed architecture plan for implementing reachability and call-graph analysis in Stella Ops, covering JavaScript, Python, PHP, and binaries, and integrating with your existing Scanner / Concelier / VEXer stack.
I will assume:
- .NET 10 for core services.
- Scanner is the place where all “trust algebra / lattice” runs (per your standing rule).
- Concelier and VEXer remain “preserve/prune” layers and do not run lattice logic.
- Output must be JSON-centric with PURLs and OpenVEX.
1. Scope & Objectives
1.1 Primary goals
-
From an OCI image, build:
- A library-level usage graph (which libraries are used by which entrypoints).
- A function-level call graph for JS / Python / PHP / binaries.
-
Map CVEs (from Concelier) to:
- Concrete components (PURLs) in the SBOM.
- Concrete functions / entrypoints / code regions inside those components.
-
Perform reachability analysis to classify each vulnerability as:
present + reachablepresent + not_reachablefunction_not_present(no vulnerable symbol)uncertain(dynamic features, unresolved calls)
-
Emit:
- Structured JSON with PURLs and call-graph nodes/edges (“reachability evidence”).
- OpenVEX documents with appropriate
status/justification.
1.2 Non-goals (for now)
- Full dynamic analysis of the running container (eBPF, ptrace, etc.) – leave as Phase 3+ optional add-on.
- Perfect call graph precision for dynamic languages (aim for safe, conservative approximations).
- Automatic “fix recommendations” (handled by other Stella Ops agents later).
2. High-Level Architecture
2.1 Major components
Within Stella Ops:
-
Scanner.WebService
- User-facing API.
- Orchestrates full scan (SBOM, CVEs, reachability).
- Hosts the Lattice/Policy engine that merges evidence and produces decisions.
-
Scanner.Worker
- Runs per-image analysis jobs.
- Invokes analyzers (JS, Python, PHP, Binary) inside its own container context.
-
Scanner.Reachability Core Library
- Unified IR for call graphs and reachability evidence.
- Interfaces for language and binary analyzers.
- Graph algorithms (BFS/DFS, lattice evaluation, entrypoint expansion).
-
Language Analyzers
Scanner.Analyzers.JavaScriptScanner.Analyzers.PythonScanner.Analyzers.PhpScanner.Analyzers.Binary
-
Symbolization & CFG (for binaries)
Scanner.Symbolization(ELF, PE, Mach-O parsers, DWARF/PDB)Scanner.Cfg(CFG + call graph for binaries)
-
Vulnerability Signature Bank
Concelier.Signatures(curated CVE→function/library fingerprints).- Exposed to Scanner as offline bundle.
-
VEXer
Vexer.Adapter.Reachability– transforms reachability evidence into OpenVEX.
2.2 Data flow (logical)
flowchart LR
A[OCI Image / Tar] --> B[Scanner.Worker: Extract FS]
B --> C[SBOM Engine (CycloneDX/SPDX)]
C --> D[Vuln Match (Concelier feeds)]
B --> E1[JS Analyzer]
B --> E2[Python Analyzer]
B --> E3[PHP Analyzer]
B --> E4[Binary Analyzer + Symbolizer/CFG]
D --> F[Reachability Orchestrator]
E1 --> F
E2 --> F
E3 --> F
E4 --> F
F --> G[Lattice/Policy Engine (Scanner.WebService)]
G --> H[Reachability Evidence JSON]
G --> I[VEXer: OpenVEX]
G --> J[Graph/Cartographer (optional)]
3. Data Model & JSON Contracts
3.1 Core IR types (Scanner.Reachability)
Define in a central assembly, e.g. StellaOps.Scanner.Reachability:
public record ComponentRef(
string Purl,
string? BomRef,
string? Name,
string? Version);
public enum SymbolKind { Function, Method, Constructor, Lambda, Import, Export }
public record SymbolId(
string Language, // "js", "python", "php", "binary"
string ComponentPurl, // SBOM component PURL or "" for app code
string LogicalName, // e.g., "server.js:handleLogin"
string? FilePath,
int? Line);
public record CallGraphNode(
string Id, // stable id, e.g., hash(SymbolId)
SymbolId Symbol,
SymbolKind Kind,
bool IsEntrypoint);
public enum CallEdgeKind { Direct, Indirect, Dynamic, External, Ffi }
public record CallGraphEdge(
string FromNodeId,
string ToNodeId,
CallEdgeKind Kind);
public record CallGraph(
string GraphId,
IReadOnlyList<CallGraphNode> Nodes,
IReadOnlyList<CallGraphEdge> Edges);
3.2 Vulnerability mapping
public record VulnerabilitySignature(
string Source, // "csaf", "nvd", "vendor"
string Id, // "CVE-2023-12345"
IReadOnlyList<string> Purls,
IReadOnlyList<string> TargetSymbolPatterns, // glob-like or regex
IReadOnlyList<string>? FilePathPatterns,
IReadOnlyList<string>? BlockFingerprints // for binaries, optional
);
3.3 Reachability evidence
public enum ReachabilityStatus
{
PresentReachable,
PresentNotReachable,
FunctionNotPresent,
Unknown
}
public record ReachabilityEvidence
(
string ImageRef,
string VulnId, // CVE or advisory id
ComponentRef Component,
ReachabilityStatus Status,
double Confidence, // 0..1
string Method, // "static-callgraph", "binary-fingerprint", etc.
IReadOnlyList<string> EntrypointNodeIds,
IReadOnlyList<IReadOnlyList<string>>? ExamplePaths // optional list of node-paths
);
3.4 JSON structure (external)
Minimal external JSON (what you store / expose):
{
"image": "registry.example.com/app:1.2.3",
"components": [
{
"purl": "pkg:npm/express@4.18.0",
"bomRef": "component-1"
}
],
"callGraphs": [
{
"graphId": "js-main",
"language": "js",
"nodes": [ /* CallGraphNode */ ],
"edges": [ /* CallGraphEdge */ ]
}
],
"reachability": [
{
"vulnId": "CVE-2023-12345",
"componentPurl": "pkg:npm/express@4.18.0",
"status": "PresentReachable",
"confidence": 0.92,
"entrypoints": [ "node:..." ],
"paths": [
["node:entry", "node:routeHandler", "node:vulnFn"]
]
}
]
}
4. Scanner-Side Architecture
4.1 Project layout (suggested)
src/
Scanner/
StellaOps.Scanner.WebService/
StellaOps.Scanner.Worker/
StellaOps.Scanner.Core/ # shared scan domain
StellaOps.Scanner.Reachability/
StellaOps.Scanner.Symbolization/
StellaOps.Scanner.Cfg/
StellaOps.Scanner.Analyzers.JavaScript/
StellaOps.Scanner.Analyzers.Python/
StellaOps.Scanner.Analyzers.Php/
StellaOps.Scanner.Analyzers.Binary/
4.2 API surface (Scanner.WebService)
-
POST /api/scan/image- Request:
{ "imageRef": "...", "profile": { "reachability": true, ... } } - Returns: scan id.
- Request:
-
GET /api/scan/{id}/reachability- Returns:
ReachabilityEvidence[], plus call graph summary (optional).
- Returns:
-
GET /api/scan/{id}/vex- Returns: OpenVEX with statuses based on reachability lattice.
4.3 Worker orchestration
StellaOps.Scanner.Worker:
-
Receives scan job with
imageRef. -
Extracts filesystem (layered rootfs) under
/mnt/scans/{scanId}/rootfs. -
Invokes SBOM generator (CycloneDX/SPDX).
-
Invokes Concelier via offline feeds to get:
- Component vulnerabilities (CVE list per PURL).
- Vulnerability signatures (fingerprints).
-
Builds a
ReachabilityPlan:public record ReachabilityPlan( IReadOnlyList<ComponentRef> Components, IReadOnlyList<VulnerabilitySignature> Vulns, IReadOnlyList<AnalyzerTarget> AnalyzerTargets // files/dirs grouped by language ); -
For each language target, dispatch analyzer:
- JavaScript:
IReachabilityAnalyzerimplementation for JS. - Python: likewise.
- PHP: likewise.
- Binary: symbolizer + CFG.
- JavaScript:
-
Collects call graphs from each analyzer and merges them into a single IR (or separate per-language graphs with shared IDs).
-
Sends merged graphs + vuln list to Reachability Engine (Scanner.Reachability).
5. Language Analyzers (JS / Python / PHP)
All analyzers implement a common interface:
public interface IReachabilityAnalyzer
{
string Language { get; } // "js", "python", "php"
Task<CallGraph> AnalyzeAsync(AnalyzerContext context, CancellationToken ct);
}
public record AnalyzerContext(
string RootFsPath,
IReadOnlyList<ComponentRef> Components,
IReadOnlyList<VulnerabilitySignature> Vulnerabilities,
IReadOnlyDictionary<string, string> Env, // container env, entrypoint, etc.
string? EntrypointCommand // container CMD/ENTRYPOINT
);
5.1 JavaScript (Node.js focus)
Inputs:
/apptree inside container (or discovered via SBOM).package.jsonfiles.- Container entrypoint (e.g.,
["node", "server.js"]).
Core steps:
-
Identify app root:
- Heuristics: directory containing
package.jsonthat owns the entry script.
- Heuristics: directory containing
-
Parse:
- All
.js,.mjs,.cjsin app andnode_modulesfor vulnerable PURLs. - Use a parsing frontend (e.g., Tree-sitter via .NET binding, or Node+AST-as-JSON).
- All
-
Build module graph:
require,import,export.
-
Function-level graph:
- For each function/method, create
CallGraphNode. - For each
callExpression, createCallGraphEdge(try to resolve callee).
- For each function/method, create
-
Entrypoints:
- Main script in CMD/ENTRYPOINT.
- HTTP route handlers (for express/koa) detected by patterns (e.g.,
app.get("/...")).
-
Map vulnerable symbols:
- From
VulnerabilitySignature.TargetSymbolPatterns(e.g.,express/lib/router/layer.js:handle_request). - Identify nodes whose
SymbolIdmatches patterns.
- From
Output:
-
CallGraphfor JS with:IsEntrypoint = truefor main and detected handlers.- Node attributes include file path, line, component PURL.
5.2 Python
Inputs:
- Site-packages paths from SBOM.
- Entrypoint script (CMD/ENTRYPOINT).
- Framework heuristics (Django, Flask) from environment variables or common entrypoints.
Core steps:
-
Discover Python interpreter chain: not needed for pure static, but useful for heuristics.
-
Parse
.pyfiles of:- App code.
- Vulnerable packages (per PURL).
-
Build module import graph (
import,from x import y). -
Function-level graph:
- Nodes for functions, methods, class constructors.
- Edges for call expressions; conservative for dynamic calls.
-
Entrypoints:
- Main script.
- WSGI callable (e.g.,
applicationinwsgi.py). - Django URLconf -> view functions.
-
Map vulnerable symbols using
TargetSymbolPatternslikedjango.middleware.security.SecurityMiddleware.__call__.
5.3 PHP
Inputs:
- Web root (from container image or conventional paths
/var/www/html,/app/public, etc.). - Composer metadata (
composer.json,vendor/). - Web server config if present (optional).
Core steps:
-
Discover front controllers (e.g.,
index.php,public/index.php). -
Parse PHP files (again, via Tree-sitter or any suitable parser).
-
Resolve include/require chains to build file-level inclusion graph.
-
Build function/method graph:
- Functions, methods, class constructors.
- Calls with best-effort resolution for namespaced functions.
-
Entrypoints:
- Front controllers and router entrypoints (e.g., Symfony, Laravel detection).
-
Map vulnerable symbols (e.g., functions in certain vendor packages, particular methods).
6. Binary Analyzer & Symbolizer
Project: StellaOps.Scanner.Analyzers.Binary + Symbolization + Cfg.
6.1 Inputs
-
All binaries and shared libraries in:
/usr/lib,/lib,/app/bin, etc.
-
SBOM link: each binary mapped to its component PURL when possible.
-
Vulnerability signatures for native libs: function names, symbol names, fingerprints.
6.2 Symbolization
Module: StellaOps.Scanner.Symbolization
-
Detect format: ELF, PE, Mach-O.
-
For ELF/Mach-O:
- Parse symbol tables (
.symtab,.dynsym). - Parse DWARF (if present) to map functions to source files/lines.
- Parse symbol tables (
-
For PE:
- Parse PDB (if present) or export table.
-
For stripped binaries:
- Run function boundary recovery (linear sweep + heuristic).
- Compute block/fn-level hashes for fingerprinting.
Output:
public record ImageFunction(
string ImageId, // e.g., SHA256 of file
ulong StartVa,
uint Size,
string? SymbolName, // demangled if possible
string FnHash, // stable hash of bytes / CFG
string? SourceFile,
int? SourceLine);
6.3 CFG + Call graph
Module: StellaOps.Scanner.Cfg
-
Disassemble
.textusing Capstone/Iced.x86. -
Build basic blocks and CFG.
-
Identify:
- Direct calls (resolved).
- PLT/IAT indirections to shared libraries.
-
Build
CallGraphfor binary functions:- Entrypoints:
main, exported functions, Gomain.main, etc. - Map application functions to library functions via PLT/IAT edges.
- Entrypoints:
6.4 Linking vulnerabilities
-
For each vulnerability affecting a native library (e.g., OpenSSL):
-
Map to candidate binaries via SBOM + PURL.
-
Within library image, find
ImageFunctions matching:SymbolNamepatterns.FnHash/BlockFingerprints(for precise detection).
-
-
Determine reachability:
- Starting from application entrypoints, traverse call graph to see if calls to vulnerable library function occur.
7. Reachability Engine & Lattice (Scanner.WebService)
Project: StellaOps.Scanner.Reachability
7.1 Inputs to engine
-
Combined
CallGraph[](per language + binary). -
Vulnerability list (CVE, GHSA, etc.) with affected PURLs.
-
Vulnerability signatures.
-
Entrypoint hints:
- Container CMD/ENTRYPOINT.
- Detected HTTP handlers, WSGI/PSGI entrypoints, etc.
7.2 Algorithm steps
-
Entrypoint expansion
- Identify all
CallGraphNodewithIsEntrypoint=true. - Add language-specific “framework entrypoints” (e.g., Express route dispatch, Django URL dispatch) when detected.
- Identify all
-
Graph traversal
-
For each entrypoint node:
- BFS/DFS through edges.
- Maintain
reachablebit on each node.
-
For dynamic edges:
- Conservative: if target cannot be resolved, mark affected path as partially unknown and downgrade confidence.
-
-
Vuln symbol resolution
-
For each vulnerability:
-
For each vulnerable component PURL found in SBOM:
- Find candidate nodes whose
SymbolIdmatchesTargetSymbolPatterns/ binary fingerprints.
- Find candidate nodes whose
-
-
If none found:
FunctionNotPresent(if component version range indicates vulnerable but we cannot find symbol – low confidence).
-
If found:
-
Check
reachablebit:- If reachable by at least one entrypoint,
PresentReachable. - Else,
PresentNotReachable.
- If reachable by at least one entrypoint,
-
-
-
Confidence computation
-
Start from:
-
1.0for direct match with explicit function name & static call. -
Lower for:
- Heuristic framework entrypoints.
- Dynamic calls.
- Fingerprint-only matches on stripped binaries.
-
-
Example rule-of-thumb:
- direct static path only: 0.95–1.0.
- dynamic edges but symbol found: 0.7–0.9.
- symbol not found but version says vulnerable: 0.4–0.6.
-
-
Lattice merge
-
Represent each CVE+component pair as a lattice element with states:
{affected, not_affected, unknown}. -
Reachability engine produces a local state:
PresentReachable→ candidateaffected.PresentNotReachableorFunctionNotPresent→ candidatenot_affected.Unknown→unknown.
-
Merge with:
- Upstream vendor VEX (from Concelier).
- Policy overrides (e.g., “treat certain CVEs as affected unless vendor says otherwise”).
-
Final state computed here (Scanner.WebService), not in Concelier or VEXer.
-
-
Evidence output
-
For each vulnerability:
-
Emit
ReachabilityEvidencewith:- Status.
- Confidence.
- Method.
- Example entrypoint paths (for UX and audit).
-
-
Persist this evidence alongside regular scan results.
-
8. Integration with SBOM & VEX
8.1 SBOM annotation
-
Extend SBOM documents (CycloneDX / SPDX) with extra properties:
-
CycloneDX:
-
component.properties:stellaops:reachability:status=present_reachable|present_not_reachable|function_not_present|unknownstellaops:reachability:confidence=0.0-1.0
-
-
SPDX:
AnnotationorExternalRefwith similar metadata.
-
8.2 OpenVEX generation
Module: StellaOps.Vexer.Adapter.Reachability
-
For each
(vuln, component)pair:-
Map to VEX statement:
-
If
PresentReachable:status: affectedjustification: component_not_fixedor similar.
-
If
PresentNotReachable:status: not_affectedjustification: function_not_reachable
-
If
FunctionNotPresent:status: not_affectedjustification: component_not_presentorfunction_not_present
-
If
Unknown:status: under_investigation(configurable).
-
-
-
Attach evidence via:
analysis/detailsfields (link to internal evidence JSON or audit link).
-
VEXer does not recalculate reachability; it uses the already computed decision + evidence.
9. Executable Containers & Offline Operation
9.1 Executable containers
-
Analyzers run inside a dedicated Scanner worker container that has:
- .NET 10 runtime.
- Language runtimes if needed for parsing (Node, Python, PHP), or Tree-sitter-based parsing.
-
Target image filesystem is mounted read-only under
/mnt/rootfs. -
No network access (offline/air-gap).
-
This satisfies “we will use executable containers” while keeping separation between:
- Target image (mount only).
- Analyzer container (StellaOps code).
9.2 Offline signature bundles
-
Concelier periodically exports:
- Vulnerability database (CSAF/NVD).
- Vulnerability Signature Bank.
-
Bundles are:
- DSSE-signed.
- Versioned (e.g.,
signatures-2025-11-01.tar.zst).
-
Scanner uses:
- The bundle digest as part of the Scan Manifest for deterministic replay.
10. Determinism & Caching
10.1 Layer-level caching
-
Key:
layerDigest + analyzerVersion + signatureBundleVersion. -
Cache artifacts:
- CallGraph(s) per layer (for JS/Python/PHP code present in that layer).
- Symbolization results per binary file hash.
-
For images sharing layers:
- Merge cached graphs instead of re-analyzing.
10.2 Deterministic scan manifest
For each scan, produce:
{
"imageRef": "registry/app:1.2.3",
"imageDigest": "sha256:...",
"scannerVersion": "1.4.0",
"analyzerVersions": {
"js": "1.0.0",
"python": "1.0.0",
"php": "1.0.0",
"binary": "1.0.0"
},
"signatureBundleDigest": "sha256:...",
"callGraphDigest": "sha256:...", // canonical JSON hash
"reachabilityEvidenceDigest": "sha256:..."
}
This manifest can be signed (Authority module) and used for audits and replay.
11. Implementation Roadmap (Phased)
Phase 0 – Infrastructure & Binary presence
Duration: 1 sprint
-
Set up
Scanner.Reachabilitycore types and interfaces. -
Implement:
- Basic Symbolizer for ELF + DWARF.
- Binary function catalog without CFG.
-
Link a small set of CVEs to binary function presence via
SymbolName. -
Expose minimal evidence:
PresentReachable/FunctionNotPresentbased only on presence (no call graph).
-
Integrate with VEXer to emit
function_not_presentjustifications.
Success criteria:
-
For selected demo images with known vulnerable/ patched OpenSSL, scanner can:
- Distinguish images where vulnerable function is present vs. absent.
- Emit OpenVEX with correct
not_affectedwhen patched.
Phase 1 – JS/Python/PHP call graphs & basic reachability
Duration: 1–2 sprints
-
Implement:
Scanner.Analyzers.JavaScriptwith module + function call graph.Scanner.Analyzers.PythonandScanner.Analyzers.Phpwith basic graphs.
-
Entrypoint detection:
- JS: main script from CMD, basic HTTP handlers.
- Python: main script + Django/Flask heuristics.
- PHP: front controllers.
-
Implement core reachability algorithm (BFS/DFS).
-
Implement simple
VulnerabilitySignaturethat uses function names and file paths. -
Hook lattice engine in Scanner.WebService and integrate with:
- Concelier vulnerability feeds.
- VEXer.
Success criteria:
-
For demo apps (Node, Django, Laravel):
- Identify vulnerable functions and mark them reachable/unreachable.
- Demonstrate noise reduction (some CVEs flagged as
not_affected).
Phase 2 – Binary CFG & Fingerprinting, Improved Confidence
Duration: 1–2 sprints
-
Extend Symbolizer & CFG for:
- Stripped binaries (function hashing).
- Shared libraries (PLT/IAT resolution).
-
Implement
VulnerabilitySignature.BlockFingerprintsto distinguish patched vs vulnerable binary functions. -
Refine confidence scoring:
- Use fingerprint match quality.
- Consider presence/absence of debug info.
-
Expand coverage:
- glibc, curl, zlib, OpenSSL, libxml2, etc.
Success criteria:
-
For curated images:
- Confirm ability to differentiate patched vs vulnerable versions even when binaries are stripped.
- Reachability reflects true call paths across app→lib boundaries.
Phase 3 – Runtime hooks (optional), UX, and Hardening
Duration: 2+ sprints
-
Add opt-in runtime confirmation:
- eBPF probes for function hits (Linux).
- Map runtime addresses back to
ImageFunctionvia symbolization.
-
Enhance console UX:
- Path explorer UI: show entrypoint → … → vulnerable function path.
- Evidence view with hash-based proofs.
-
Hardening:
- Performance optimization for large images (parallel analysis, caching).
- Conservative fallbacks for dynamic language features.
Success criteria:
-
For selected environments where runtime is allowed:
- Static reachability is confirmed by runtime traces in majority of cases.
- No significant performance regression on typical images.
12. How this satisfies your initial bullets
From your initial requirements:
-
JavaScript, Python, PHP, binary → Dedicated analyzers per language + binary symbolization/CFG, unified in
Scanner.Reachability. -
Executable containers → Analyzers run inside Scanner’s worker container, mounting the target image rootfs; no network access.
-
Libraries usage call graph → Call graphs map from entrypoints → app code → library functions; SBOM + PURLs tie functions to libraries.
-
Reachability analysis → BFS/DFS from entrypoints over per-language and binary graphs, with lattice-based merging in
Scanner.WebService. -
JSON + PURLs → All evidence is JSON with PURL-tagged components; SBOM is annotated, and VEX statements reference those PURLs.
If you like, next step can be: I draft concrete C# interface definitions (including some initial Tree-sitter integration stubs for JS/Python/PHP) and a skeleton of the ReachabilityPlan and ReachabilityEngine classes that you can drop into the monorepo.