feat: Add new provenance and crypto registry documentation
- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
This commit is contained in:
927
docs/product-advisories/18-Nov-2026 - 1 copy 4.md
Normal file
927
docs/product-advisories/18-Nov-2026 - 1 copy 4.md
Normal file
@@ -0,0 +1,927 @@
|
||||
|
||||
Here’s a crisp idea that could give Stella Ops a real moat: **binary‑level reachability**—linking CVEs directly to the exact functions and offsets inside compiled artifacts (ELF/PE/Mach‑O), not just to packages.
|
||||
|
||||
---
|
||||
|
||||
### Why this matters (quick background)
|
||||
|
||||
* **Package‑level flags are noisy.** Most scanners say “vuln in `libX v1.2`,” but that library might be present and never executed.
|
||||
* **Language‑level call graphs help** (when you have source or rich metadata), but containers often ship only **stripped binaries**.
|
||||
* **Binary reachability** answers: *Is the vulnerable function actually in this image? Is its code path reachable from the entrypoints we observed or can construct?*
|
||||
|
||||
---
|
||||
|
||||
### The missing layer: Symbolization
|
||||
|
||||
Build a **symbolization layer** that normalizes debug and symbol info across platforms:
|
||||
|
||||
* **Inputs**: DWARF (ELF/Mach‑O), PDB (PE/Windows), symtabs, exported symbols, `.eh_frame`, and (when stripped) heuristic signatures (e.g., function byte‑hashes, CFG fingerprints).
|
||||
* **Outputs**: a source‑agnostic map: `{binary → sections → functions → (addresses, ranges, hashes, demangled names, inlined frames)}`.
|
||||
* **Normalization**: Put everything into a common schema (e.g., `Stella.Symbolix.v1`) so higher layers don’t care if it came from DWARF or PDB.
|
||||
|
||||
---
|
||||
|
||||
### End‑to‑end reachability (binary‑first, source‑agnostic)
|
||||
|
||||
1. **Acquire & parse**
|
||||
|
||||
* Detect format (ELF/PE/Mach‑O), parse headers, sections, symbol tables.
|
||||
* If debug info present: parse DWARF/PDB; else fall back to disassembly + function boundary recovery.
|
||||
2. **Function catalog**
|
||||
|
||||
* Assign stable IDs per function: `(imageHash, textSectionHash, startVA, size, fnHashXX)`.
|
||||
* Record x‑refs (calls/jumps), imports/exports, PLT/IAT edges.
|
||||
3. **Entrypoint discovery**
|
||||
|
||||
* Docker entry, process launch args, service scripts; infer likely mains (Go `main.main`, .NET hostfxr path, JVM launcher, etc.).
|
||||
4. **Call‑graph build (binary CFG)**
|
||||
|
||||
* Build inter/intra‑procedural graph (direct + resolved indirect via IAT/PLT). Keep “unknown‑target” edges for conservative safety.
|
||||
5. **CVE→function linking**
|
||||
|
||||
* Maintain a **signature bank** per CVE advisory: vulnerable function names, file paths, and—crucially—**byte‑sequence or basic‑block fingerprints** for patched vs vulnerable versions (works even when stripped).
|
||||
6. **Reachability analysis**
|
||||
|
||||
* Is the vulnerable function present? Is there a path from any entrypoint to it (under conservative assumptions)? Tag as `Present+Reachable`, `Present+Uncertain`, or `Absent`.
|
||||
7. **Runtime confirmation (optional, when users allow)**
|
||||
|
||||
* Lightweight probes (eBPF on Linux, ETW on Windows, perf/JFR/EventPipe) capture function hits; cross‑check with the static result to upgrade confidence.
|
||||
|
||||
---
|
||||
|
||||
### Minimal component plan (drop into Stella Ops)
|
||||
|
||||
* **Scanner.Symbolizer**
|
||||
Parsers: ELF/DWARF (libdw or pure‑managed reader), PE/PDB (Dia/LLVM PDB), Mach‑O/DSYM.
|
||||
Output: `Symbolix.v1` blobs stored in OCI layer cache.
|
||||
* **Scanner.CFG**
|
||||
Lifts functions to a normalized IR (capstone/iced‑x86 for decode) → builds CFG & call graph.
|
||||
* **Advisory.FingerprintBank**
|
||||
Ingests CSAF/OpenVEX plus curated fingerprints (fn names, block hashes, patch diff markers). Versioned, signed, air‑gap‑syncable.
|
||||
* **Reachability.Engine**
|
||||
Joins (`Symbolix` + `CFG` + `FingerprintBank`) → emits `ReachabilityEvidence` with lattice states for VEX.
|
||||
* **VEXer.Adapter**
|
||||
Emits **OpenVEX** statements with `status: affected/not_affected` and `justification: function_not_present | function_not_reachable | mitigated_at_runtime`, attaching Evidence URIs.
|
||||
* **Console UX**
|
||||
“Why not affected?” panel showing entrypoint→…→function path (or absence), with byte‑hash proof.
|
||||
|
||||
---
|
||||
|
||||
### Data model sketch (concise)
|
||||
|
||||
* `ImageFunction { id, name?, startVA, size, fnHash, sectionHash, demangled?, provenance:{DWARF|PDB|Heuristic} }`
|
||||
* `Edge { srcFnId, dstFnId, kind:{direct|plt|iat|indirect?} }`
|
||||
* `CveSignature { cveId, fnName?, libHints[], blockFingerprints[], versionRanges }`
|
||||
* `Evidence { cveId, imageId, functionMatches[], reachable: bool?, confidence:[low|med|high], method:[static|runtime|hybrid] }`
|
||||
|
||||
---
|
||||
|
||||
### Practical phases (8–10 weeks of focused work)
|
||||
|
||||
1. **P0**: ELF/DWARF symbolizer + basic function catalog; link a handful of CVEs via name‑only; emit OpenVEX `function_not_present`.
|
||||
2. **P1**: CFG builder (direct calls) + PLT/IAT resolution; simple reachability; first fingerprints for top 50 CVEs in glibc, openssl, curl, zlib.
|
||||
3. **P2**: Stripped‑binary heuristics (block hashing) + Go/Rust name demangling; Windows PDB ingestion for PE.
|
||||
4. **P3**: Runtime probes (opt‑in) + confidence upgrade logic; Console path explorer; evidence signing (DSSE).
|
||||
|
||||
---
|
||||
|
||||
### KPIs to prove the moat
|
||||
|
||||
* **Noise cut**: % reduction in “affected” flags after reachability (target 40–70% on typical containers).
|
||||
* **Precision**: Ground‑truth validation vs PoC images (TP/FP/FN on presence & reachability).
|
||||
* **Coverage**: % images where we can make a determination without source (goal: >80%).
|
||||
* **Latency**: Added scan time per image (<15s typical with caches).
|
||||
|
||||
---
|
||||
|
||||
### Risks & how to handle them
|
||||
|
||||
* **Stripped binaries** → mitigate with block‑hash fingerprints & library‑version heuristics.
|
||||
* **Obfuscated/packed code** → mark `Uncertain`; allow user‑supplied hints; prefer runtime confirmation.
|
||||
* **Advisory inconsistency** → keep our own curated CVE→function fingerprint bank; sign & version it.
|
||||
* **Platform spread** → start Linux/ELF, then Windows/PDB, then Mach‑O.
|
||||
|
||||
---
|
||||
|
||||
### Why competitors struggle
|
||||
|
||||
Most tools stop at packages because binary CFG + fingerprint curation is hard and expensive. Shipping a **source‑agnostic reachability engine** tied to signed evidence in VEX would set Stella Ops apart—especially in offline/air‑gapped and sovereign contexts you already target.
|
||||
|
||||
If you want, I can draft:
|
||||
|
||||
* the `Symbolix.v1` protobuf,
|
||||
* a tiny PoC (ELF→functions→match CVE with a block fingerprint),
|
||||
* and the OpenVEX emission snippet your VEXer can produce.
|
||||
Below is a detailed architecture plan for implementing reachability and call-graph analysis in Stella Ops, covering JavaScript, Python, PHP, and binaries, and integrating with your existing Scanner / Concelier / VEXer stack.
|
||||
|
||||
I will assume:
|
||||
|
||||
* .NET 10 for core services.
|
||||
* Scanner is the place where all “trust algebra / lattice” runs (per your standing rule).
|
||||
* Concelier and VEXer remain “preserve/prune” layers and do not run lattice logic.
|
||||
* Output must be JSON-centric with PURLs and OpenVEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope & Objectives
|
||||
|
||||
### 1.1 Primary goals
|
||||
|
||||
1. From an OCI image, build:
|
||||
|
||||
* A **library-level usage graph** (which libraries are used by which entrypoints).
|
||||
* A **function-level call graph** for JS / Python / PHP / binaries.
|
||||
2. Map CVEs (from Concelier) to:
|
||||
|
||||
* Concrete **components** (PURLs) in the SBOM.
|
||||
* Concrete **functions / entrypoints / code regions** inside those components.
|
||||
3. Perform **reachability analysis** to classify each vulnerability as:
|
||||
|
||||
* `present + reachable`
|
||||
* `present + not_reachable`
|
||||
* `function_not_present` (no vulnerable symbol)
|
||||
* `uncertain` (dynamic features, unresolved calls)
|
||||
4. Emit:
|
||||
|
||||
* **Structured JSON** with PURLs and call-graph nodes/edges (“reachability evidence”).
|
||||
* **OpenVEX** documents with appropriate `status`/`justification`.
|
||||
|
||||
### 1.2 Non-goals (for now)
|
||||
|
||||
* Full dynamic analysis of the running container (eBPF, ptrace, etc.) – leave as Phase 3+ optional add-on.
|
||||
* Perfect call graph precision for dynamic languages (aim for safe, conservative approximations).
|
||||
* Automatic “fix recommendations” (handled by other Stella Ops agents later).
|
||||
|
||||
---
|
||||
|
||||
## 2. High-Level Architecture
|
||||
|
||||
### 2.1 Major components
|
||||
|
||||
Within Stella Ops:
|
||||
|
||||
* **Scanner.WebService**
|
||||
|
||||
* User-facing API.
|
||||
* Orchestrates full scan (SBOM, CVEs, reachability).
|
||||
* Hosts the **Lattice/Policy engine** that merges evidence and produces decisions.
|
||||
* **Scanner.Worker**
|
||||
|
||||
* Runs per-image analysis jobs.
|
||||
* Invokes analyzers (JS, Python, PHP, Binary) inside its own container context.
|
||||
* **Scanner.Reachability Core Library**
|
||||
|
||||
* Unified IR for call graphs and reachability evidence.
|
||||
* Interfaces for language and binary analyzers.
|
||||
* Graph algorithms (BFS/DFS, lattice evaluation, entrypoint expansion).
|
||||
* **Language Analyzers**
|
||||
|
||||
* `Scanner.Analyzers.JavaScript`
|
||||
* `Scanner.Analyzers.Python`
|
||||
* `Scanner.Analyzers.Php`
|
||||
* `Scanner.Analyzers.Binary`
|
||||
* **Symbolization & CFG (for binaries)**
|
||||
|
||||
* `Scanner.Symbolization` (ELF, PE, Mach-O parsers, DWARF/PDB)
|
||||
* `Scanner.Cfg` (CFG + call graph for binaries)
|
||||
* **Vulnerability Signature Bank**
|
||||
|
||||
* `Concelier.Signatures` (curated CVE→function/library fingerprints).
|
||||
* Exposed to Scanner as **offline bundle**.
|
||||
* **VEXer**
|
||||
|
||||
* `Vexer.Adapter.Reachability` – transforms reachability evidence into OpenVEX.
|
||||
|
||||
### 2.2 Data flow (logical)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[OCI Image / Tar] --> B[Scanner.Worker: Extract FS]
|
||||
B --> C[SBOM Engine (CycloneDX/SPDX)]
|
||||
C --> D[Vuln Match (Concelier feeds)]
|
||||
B --> E1[JS Analyzer]
|
||||
B --> E2[Python Analyzer]
|
||||
B --> E3[PHP Analyzer]
|
||||
B --> E4[Binary Analyzer + Symbolizer/CFG]
|
||||
|
||||
D --> F[Reachability Orchestrator]
|
||||
E1 --> F
|
||||
E2 --> F
|
||||
E3 --> F
|
||||
E4 --> F
|
||||
F --> G[Lattice/Policy Engine (Scanner.WebService)]
|
||||
G --> H[Reachability Evidence JSON]
|
||||
G --> I[VEXer: OpenVEX]
|
||||
G --> J[Graph/Cartographer (optional)]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Data Model & JSON Contracts
|
||||
|
||||
### 3.1 Core IR types (Scanner.Reachability)
|
||||
|
||||
Define in a central assembly, e.g. `StellaOps.Scanner.Reachability`:
|
||||
|
||||
```csharp
|
||||
public record ComponentRef(
|
||||
string Purl,
|
||||
string? BomRef,
|
||||
string? Name,
|
||||
string? Version);
|
||||
|
||||
public enum SymbolKind { Function, Method, Constructor, Lambda, Import, Export }
|
||||
|
||||
public record SymbolId(
|
||||
string Language, // "js", "python", "php", "binary"
|
||||
string ComponentPurl, // SBOM component PURL or "" for app code
|
||||
string LogicalName, // e.g., "server.js:handleLogin"
|
||||
string? FilePath,
|
||||
int? Line);
|
||||
|
||||
public record CallGraphNode(
|
||||
string Id, // stable id, e.g., hash(SymbolId)
|
||||
SymbolId Symbol,
|
||||
SymbolKind Kind,
|
||||
bool IsEntrypoint);
|
||||
|
||||
public enum CallEdgeKind { Direct, Indirect, Dynamic, External, Ffi }
|
||||
|
||||
public record CallGraphEdge(
|
||||
string FromNodeId,
|
||||
string ToNodeId,
|
||||
CallEdgeKind Kind);
|
||||
|
||||
public record CallGraph(
|
||||
string GraphId,
|
||||
IReadOnlyList<CallGraphNode> Nodes,
|
||||
IReadOnlyList<CallGraphEdge> Edges);
|
||||
```
|
||||
|
||||
### 3.2 Vulnerability mapping
|
||||
|
||||
```csharp
|
||||
public record VulnerabilitySignature(
|
||||
string Source, // "csaf", "nvd", "vendor"
|
||||
string Id, // "CVE-2023-12345"
|
||||
IReadOnlyList<string> Purls,
|
||||
IReadOnlyList<string> TargetSymbolPatterns, // glob-like or regex
|
||||
IReadOnlyList<string>? FilePathPatterns,
|
||||
IReadOnlyList<string>? BlockFingerprints // for binaries, optional
|
||||
);
|
||||
```
|
||||
|
||||
### 3.3 Reachability evidence
|
||||
|
||||
```csharp
|
||||
public enum ReachabilityStatus
|
||||
{
|
||||
PresentReachable,
|
||||
PresentNotReachable,
|
||||
FunctionNotPresent,
|
||||
Unknown
|
||||
}
|
||||
|
||||
public record ReachabilityEvidence
|
||||
(
|
||||
string ImageRef,
|
||||
string VulnId, // CVE or advisory id
|
||||
ComponentRef Component,
|
||||
ReachabilityStatus Status,
|
||||
double Confidence, // 0..1
|
||||
string Method, // "static-callgraph", "binary-fingerprint", etc.
|
||||
IReadOnlyList<string> EntrypointNodeIds,
|
||||
IReadOnlyList<IReadOnlyList<string>>? ExamplePaths // optional list of node-paths
|
||||
);
|
||||
```
|
||||
|
||||
### 3.4 JSON structure (external)
|
||||
|
||||
Minimal external JSON (what you store / expose):
|
||||
|
||||
```json
|
||||
{
|
||||
"image": "registry.example.com/app:1.2.3",
|
||||
"components": [
|
||||
{
|
||||
"purl": "pkg:npm/express@4.18.0",
|
||||
"bomRef": "component-1"
|
||||
}
|
||||
],
|
||||
"callGraphs": [
|
||||
{
|
||||
"graphId": "js-main",
|
||||
"language": "js",
|
||||
"nodes": [ /* CallGraphNode */ ],
|
||||
"edges": [ /* CallGraphEdge */ ]
|
||||
}
|
||||
],
|
||||
"reachability": [
|
||||
{
|
||||
"vulnId": "CVE-2023-12345",
|
||||
"componentPurl": "pkg:npm/express@4.18.0",
|
||||
"status": "PresentReachable",
|
||||
"confidence": 0.92,
|
||||
"entrypoints": [ "node:..." ],
|
||||
"paths": [
|
||||
["node:entry", "node:routeHandler", "node:vulnFn"]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Scanner-Side Architecture
|
||||
|
||||
### 4.1 Project layout (suggested)
|
||||
|
||||
```text
|
||||
src/
|
||||
Scanner/
|
||||
StellaOps.Scanner.WebService/
|
||||
StellaOps.Scanner.Worker/
|
||||
StellaOps.Scanner.Core/ # shared scan domain
|
||||
StellaOps.Scanner.Reachability/
|
||||
StellaOps.Scanner.Symbolization/
|
||||
StellaOps.Scanner.Cfg/
|
||||
StellaOps.Scanner.Analyzers.JavaScript/
|
||||
StellaOps.Scanner.Analyzers.Python/
|
||||
StellaOps.Scanner.Analyzers.Php/
|
||||
StellaOps.Scanner.Analyzers.Binary/
|
||||
```
|
||||
|
||||
### 4.2 API surface (Scanner.WebService)
|
||||
|
||||
* `POST /api/scan/image`
|
||||
|
||||
* Request: `{ "imageRef": "...", "profile": { "reachability": true, ... } }`
|
||||
* Returns: scan id.
|
||||
* `GET /api/scan/{id}/reachability`
|
||||
|
||||
* Returns: `ReachabilityEvidence[]`, plus call graph summary (optional).
|
||||
* `GET /api/scan/{id}/vex`
|
||||
|
||||
* Returns: OpenVEX with statuses based on reachability lattice.
|
||||
|
||||
### 4.3 Worker orchestration
|
||||
|
||||
`StellaOps.Scanner.Worker`:
|
||||
|
||||
1. Receives scan job with `imageRef`.
|
||||
|
||||
2. Extracts filesystem (layered rootfs) under `/mnt/scans/{scanId}/rootfs`.
|
||||
|
||||
3. Invokes SBOM generator (CycloneDX/SPDX).
|
||||
|
||||
4. Invokes Concelier via offline feeds to get:
|
||||
|
||||
* Component vulnerabilities (CVE list per PURL).
|
||||
* Vulnerability signatures (fingerprints).
|
||||
|
||||
5. Builds a `ReachabilityPlan`:
|
||||
|
||||
```csharp
|
||||
public record ReachabilityPlan(
|
||||
IReadOnlyList<ComponentRef> Components,
|
||||
IReadOnlyList<VulnerabilitySignature> Vulns,
|
||||
IReadOnlyList<AnalyzerTarget> AnalyzerTargets // files/dirs grouped by language
|
||||
);
|
||||
```
|
||||
|
||||
6. For each language target, dispatch analyzer:
|
||||
|
||||
* JavaScript: `IReachabilityAnalyzer` implementation for JS.
|
||||
* Python: likewise.
|
||||
* PHP: likewise.
|
||||
* Binary: symbolizer + CFG.
|
||||
|
||||
7. Collects call graphs from each analyzer and merges them into a single IR (or separate per-language graphs with shared IDs).
|
||||
|
||||
8. Sends merged graphs + vuln list to **Reachability Engine** (Scanner.Reachability).
|
||||
|
||||
---
|
||||
|
||||
## 5. Language Analyzers (JS / Python / PHP)
|
||||
|
||||
All analyzers implement a common interface:
|
||||
|
||||
```csharp
|
||||
public interface IReachabilityAnalyzer
|
||||
{
|
||||
string Language { get; } // "js", "python", "php"
|
||||
|
||||
Task<CallGraph> AnalyzeAsync(AnalyzerContext context, CancellationToken ct);
|
||||
}
|
||||
|
||||
public record AnalyzerContext(
|
||||
string RootFsPath,
|
||||
IReadOnlyList<ComponentRef> Components,
|
||||
IReadOnlyList<VulnerabilitySignature> Vulnerabilities,
|
||||
IReadOnlyDictionary<string, string> Env, // container env, entrypoint, etc.
|
||||
string? EntrypointCommand // container CMD/ENTRYPOINT
|
||||
);
|
||||
```
|
||||
|
||||
### 5.1 JavaScript (Node.js focus)
|
||||
|
||||
**Inputs:**
|
||||
|
||||
* `/app` tree inside container (or discovered via SBOM).
|
||||
* `package.json` files.
|
||||
* Container entrypoint (e.g., `["node", "server.js"]`).
|
||||
|
||||
**Core steps:**
|
||||
|
||||
1. Identify **app root**:
|
||||
|
||||
* Heuristics: directory containing `package.json` that owns the entry script.
|
||||
2. Parse:
|
||||
|
||||
* All `.js`, `.mjs`, `.cjs` in app and `node_modules` for vulnerable PURLs.
|
||||
* Use a parsing frontend (e.g., Tree-sitter via .NET binding, or Node+AST-as-JSON).
|
||||
3. Build module graph:
|
||||
|
||||
* `require`, `import`, `export`.
|
||||
4. Function-level graph:
|
||||
|
||||
* For each function/method, create `CallGraphNode`.
|
||||
* For each `callExpression`, create `CallGraphEdge` (try to resolve callee).
|
||||
5. Entrypoints:
|
||||
|
||||
* Main script in CMD/ENTRYPOINT.
|
||||
* HTTP route handlers (for express/koa) detected by patterns (e.g., `app.get("/...")`).
|
||||
6. Map vulnerable symbols:
|
||||
|
||||
* From `VulnerabilitySignature.TargetSymbolPatterns` (e.g., `express/lib/router/layer.js:handle_request`).
|
||||
* Identify nodes whose `SymbolId` matches patterns.
|
||||
|
||||
**Output:**
|
||||
|
||||
* `CallGraph` for JS with:
|
||||
|
||||
* `IsEntrypoint = true` for main and detected handlers.
|
||||
* Node attributes include file path, line, component PURL.
|
||||
|
||||
### 5.2 Python
|
||||
|
||||
**Inputs:**
|
||||
|
||||
* Site-packages paths from SBOM.
|
||||
* Entrypoint script (CMD/ENTRYPOINT).
|
||||
* Framework heuristics (Django, Flask) from environment variables or common entrypoints.
|
||||
|
||||
**Core steps:**
|
||||
|
||||
1. Discover Python interpreter chain: not needed for pure static, but useful for heuristics.
|
||||
2. Parse `.py` files of:
|
||||
|
||||
* App code.
|
||||
* Vulnerable packages (per PURL).
|
||||
3. Build module import graph (`import`, `from x import y`).
|
||||
4. Function-level graph:
|
||||
|
||||
* Nodes for functions, methods, class constructors.
|
||||
* Edges for call expressions; conservative for dynamic calls.
|
||||
5. Entrypoints:
|
||||
|
||||
* Main script.
|
||||
* WSGI callable (e.g., `application` in `wsgi.py`).
|
||||
* Django URLconf -> view functions.
|
||||
6. Map vulnerable symbols using `TargetSymbolPatterns` like `django.middleware.security.SecurityMiddleware.__call__`.
|
||||
|
||||
### 5.3 PHP
|
||||
|
||||
**Inputs:**
|
||||
|
||||
* Web root (from container image or conventional paths `/var/www/html`, `/app/public`, etc.).
|
||||
* Composer metadata (`composer.json`, `vendor/`).
|
||||
* Web server config if present (optional).
|
||||
|
||||
**Core steps:**
|
||||
|
||||
1. Discover front controllers (e.g., `index.php`, `public/index.php`).
|
||||
2. Parse PHP files (again, via Tree-sitter or any suitable parser).
|
||||
3. Resolve include/require chains to build file-level inclusion graph.
|
||||
4. Build function/method graph:
|
||||
|
||||
* Functions, methods, class constructors.
|
||||
* Calls with best-effort resolution for namespaced functions.
|
||||
5. Entrypoints:
|
||||
|
||||
* Front controllers and router entrypoints (e.g., Symfony, Laravel detection).
|
||||
6. Map vulnerable symbols (e.g., functions in certain vendor packages, particular methods).
|
||||
|
||||
---
|
||||
|
||||
## 6. Binary Analyzer & Symbolizer
|
||||
|
||||
Project: `StellaOps.Scanner.Analyzers.Binary` + `Symbolization` + `Cfg`.
|
||||
|
||||
### 6.1 Inputs
|
||||
|
||||
* All binaries and shared libraries in:
|
||||
|
||||
* `/usr/lib`, `/lib`, `/app/bin`, etc.
|
||||
* SBOM link: each binary mapped to its component PURL when possible.
|
||||
* Vulnerability signatures for native libs: function names, symbol names, fingerprints.
|
||||
|
||||
### 6.2 Symbolization
|
||||
|
||||
Module: `StellaOps.Scanner.Symbolization`
|
||||
|
||||
* Detect format: ELF, PE, Mach-O.
|
||||
* For ELF/Mach-O:
|
||||
|
||||
* Parse symbol tables (`.symtab`, `.dynsym`).
|
||||
* Parse DWARF (if present) to map functions to source files/lines.
|
||||
* For PE:
|
||||
|
||||
* Parse PDB (if present) or export table.
|
||||
* For stripped binaries:
|
||||
|
||||
* Run function boundary recovery (linear sweep + heuristic).
|
||||
* Compute block/fn-level hashes for fingerprinting.
|
||||
|
||||
Output:
|
||||
|
||||
```csharp
|
||||
public record ImageFunction(
|
||||
string ImageId, // e.g., SHA256 of file
|
||||
ulong StartVa,
|
||||
uint Size,
|
||||
string? SymbolName, // demangled if possible
|
||||
string FnHash, // stable hash of bytes / CFG
|
||||
string? SourceFile,
|
||||
int? SourceLine);
|
||||
```
|
||||
|
||||
### 6.3 CFG + Call graph
|
||||
|
||||
Module: `StellaOps.Scanner.Cfg`
|
||||
|
||||
* Disassemble `.text` using Capstone/Iced.x86.
|
||||
* Build basic blocks and CFG.
|
||||
* Identify:
|
||||
|
||||
* Direct calls (resolved).
|
||||
* PLT/IAT indirections to shared libraries.
|
||||
* Build `CallGraph` for binary functions:
|
||||
|
||||
* Entrypoints: `main`, exported functions, Go `main.main`, etc.
|
||||
* Map application functions to library functions via PLT/IAT edges.
|
||||
|
||||
### 6.4 Linking vulnerabilities
|
||||
|
||||
* For each vulnerability affecting a native library (e.g., OpenSSL):
|
||||
|
||||
* Map to candidate binaries via SBOM + PURL.
|
||||
* Within library image, find `ImageFunction`s matching:
|
||||
|
||||
* `SymbolName` patterns.
|
||||
* `FnHash` / `BlockFingerprints` (for precise detection).
|
||||
* Determine reachability:
|
||||
|
||||
* Starting from application entrypoints, traverse call graph to see if calls to vulnerable library function occur.
|
||||
|
||||
---
|
||||
|
||||
## 7. Reachability Engine & Lattice (Scanner.WebService)
|
||||
|
||||
Project: `StellaOps.Scanner.Reachability`
|
||||
|
||||
### 7.1 Inputs to engine
|
||||
|
||||
* Combined `CallGraph[]` (per language + binary).
|
||||
* Vulnerability list (CVE, GHSA, etc.) with affected PURLs.
|
||||
* Vulnerability signatures.
|
||||
* Entrypoint hints:
|
||||
|
||||
* Container CMD/ENTRYPOINT.
|
||||
* Detected HTTP handlers, WSGI/PSGI entrypoints, etc.
|
||||
|
||||
### 7.2 Algorithm steps
|
||||
|
||||
1. **Entrypoint expansion**
|
||||
|
||||
* Identify all `CallGraphNode` with `IsEntrypoint=true`.
|
||||
* Add language-specific “framework entrypoints” (e.g., Express route dispatch, Django URL dispatch) when detected.
|
||||
|
||||
2. **Graph traversal**
|
||||
|
||||
* For each entrypoint node:
|
||||
|
||||
* BFS/DFS through edges.
|
||||
* Maintain `reachable` bit on each node.
|
||||
* For dynamic edges:
|
||||
|
||||
* Conservative: if target cannot be resolved, mark affected path as partially unknown and downgrade confidence.
|
||||
|
||||
3. **Vuln symbol resolution**
|
||||
|
||||
* For each vulnerability:
|
||||
|
||||
* For each vulnerable component PURL found in SBOM:
|
||||
|
||||
* Find candidate nodes whose `SymbolId` matches `TargetSymbolPatterns` / binary fingerprints.
|
||||
* If none found:
|
||||
|
||||
* `FunctionNotPresent` (if component version range indicates vulnerable but we cannot find symbol – low confidence).
|
||||
* If found:
|
||||
|
||||
* Check `reachable` bit:
|
||||
|
||||
* If reachable by at least one entrypoint, `PresentReachable`.
|
||||
* Else, `PresentNotReachable`.
|
||||
|
||||
4. **Confidence computation**
|
||||
|
||||
* Start from:
|
||||
|
||||
* `1.0` for direct match with explicit function name & static call.
|
||||
* Lower for:
|
||||
|
||||
* Heuristic framework entrypoints.
|
||||
* Dynamic calls.
|
||||
* Fingerprint-only matches on stripped binaries.
|
||||
* Example rule-of-thumb:
|
||||
|
||||
* direct static path only: 0.95–1.0.
|
||||
* dynamic edges but symbol found: 0.7–0.9.
|
||||
* symbol not found but version says vulnerable: 0.4–0.6.
|
||||
|
||||
5. **Lattice merge**
|
||||
|
||||
* Represent each CVE+component pair as a lattice element with states: `{affected, not_affected, unknown}`.
|
||||
* Reachability engine produces a **local state**:
|
||||
|
||||
* `PresentReachable` → candidate `affected`.
|
||||
* `PresentNotReachable` or `FunctionNotPresent` → candidate `not_affected`.
|
||||
* `Unknown` → `unknown`.
|
||||
* Merge with:
|
||||
|
||||
* Upstream vendor VEX (from Concelier).
|
||||
* Policy overrides (e.g., “treat certain CVEs as affected unless vendor says otherwise”).
|
||||
* Final state computed here (Scanner.WebService), not in Concelier or VEXer.
|
||||
|
||||
6. **Evidence output**
|
||||
|
||||
* For each vulnerability:
|
||||
|
||||
* Emit `ReachabilityEvidence` with:
|
||||
|
||||
* Status.
|
||||
* Confidence.
|
||||
* Method.
|
||||
* Example entrypoint paths (for UX and audit).
|
||||
* Persist this evidence alongside regular scan results.
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration with SBOM & VEX
|
||||
|
||||
### 8.1 SBOM annotation
|
||||
|
||||
* Extend SBOM documents (CycloneDX / SPDX) with extra properties:
|
||||
|
||||
* CycloneDX:
|
||||
|
||||
* `component.properties`:
|
||||
|
||||
* `stellaops:reachability:status` = `present_reachable|present_not_reachable|function_not_present|unknown`
|
||||
* `stellaops:reachability:confidence` = `0.0-1.0`
|
||||
* SPDX:
|
||||
|
||||
* `Annotation` or `ExternalRef` with similar metadata.
|
||||
|
||||
### 8.2 OpenVEX generation
|
||||
|
||||
Module: `StellaOps.Vexer.Adapter.Reachability`
|
||||
|
||||
* For each `(vuln, component)` pair:
|
||||
|
||||
* Map to VEX statement:
|
||||
|
||||
* If `PresentReachable`:
|
||||
|
||||
* `status: affected`
|
||||
* `justification: component_not_fixed` or similar.
|
||||
* If `PresentNotReachable`:
|
||||
|
||||
* `status: not_affected`
|
||||
* `justification: function_not_reachable`
|
||||
* If `FunctionNotPresent`:
|
||||
|
||||
* `status: not_affected`
|
||||
* `justification: component_not_present` or `function_not_present`
|
||||
* If `Unknown`:
|
||||
|
||||
* `status: under_investigation` (configurable).
|
||||
|
||||
* Attach evidence via:
|
||||
|
||||
* `analysis` / `details` fields (link to internal evidence JSON or audit link).
|
||||
|
||||
* VEXer does not recalculate reachability; it uses the already computed decision + evidence.
|
||||
|
||||
---
|
||||
|
||||
## 9. Executable Containers & Offline Operation
|
||||
|
||||
### 9.1 Executable containers
|
||||
|
||||
* Analyzers run inside a dedicated Scanner worker container that has:
|
||||
|
||||
* .NET 10 runtime.
|
||||
* Language runtimes if needed for parsing (Node, Python, PHP), or Tree-sitter-based parsing.
|
||||
* Target image filesystem is mounted read-only under `/mnt/rootfs`.
|
||||
* No network access (offline/air-gap).
|
||||
* This satisfies “we will use executable containers” while keeping separation between:
|
||||
|
||||
* Target image (mount only).
|
||||
* Analyzer container (StellaOps code).
|
||||
|
||||
### 9.2 Offline signature bundles
|
||||
|
||||
* Concelier periodically exports:
|
||||
|
||||
* Vulnerability database (CSAF/NVD).
|
||||
* Vulnerability Signature Bank.
|
||||
* Bundles are:
|
||||
|
||||
* DSSE-signed.
|
||||
* Versioned (e.g., `signatures-2025-11-01.tar.zst`).
|
||||
* Scanner uses:
|
||||
|
||||
* The bundle digest as part of the **Scan Manifest** for deterministic replay.
|
||||
|
||||
---
|
||||
|
||||
## 10. Determinism & Caching
|
||||
|
||||
### 10.1 Layer-level caching
|
||||
|
||||
* Key: `layerDigest + analyzerVersion + signatureBundleVersion`.
|
||||
* Cache artifacts:
|
||||
|
||||
* CallGraph(s) per layer (for JS/Python/PHP code present in that layer).
|
||||
* Symbolization results per binary file hash.
|
||||
* For images sharing layers:
|
||||
|
||||
* Merge cached graphs instead of re-analyzing.
|
||||
|
||||
### 10.2 Deterministic scan manifest
|
||||
|
||||
For each scan, produce:
|
||||
|
||||
```json
|
||||
{
|
||||
"imageRef": "registry/app:1.2.3",
|
||||
"imageDigest": "sha256:...",
|
||||
"scannerVersion": "1.4.0",
|
||||
"analyzerVersions": {
|
||||
"js": "1.0.0",
|
||||
"python": "1.0.0",
|
||||
"php": "1.0.0",
|
||||
"binary": "1.0.0"
|
||||
},
|
||||
"signatureBundleDigest": "sha256:...",
|
||||
"callGraphDigest": "sha256:...", // canonical JSON hash
|
||||
"reachabilityEvidenceDigest": "sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
This manifest can be signed (Authority module) and used for audits and replay.
|
||||
|
||||
---
|
||||
|
||||
## 11. Implementation Roadmap (Phased)
|
||||
|
||||
### Phase 0 – Infrastructure & Binary presence
|
||||
|
||||
**Duration:** 1 sprint
|
||||
|
||||
* Set up `Scanner.Reachability` core types and interfaces.
|
||||
* Implement:
|
||||
|
||||
* Basic Symbolizer for ELF + DWARF.
|
||||
* Binary function catalog without CFG.
|
||||
* Link a small set of CVEs to binary function presence via `SymbolName`.
|
||||
* Expose minimal evidence:
|
||||
|
||||
* `PresentReachable`/`FunctionNotPresent` based only on presence (no call graph).
|
||||
* Integrate with VEXer to emit `function_not_present` justifications.
|
||||
|
||||
**Success criteria:**
|
||||
|
||||
* For selected demo images with known vulnerable/ patched OpenSSL, scanner can:
|
||||
|
||||
* Distinguish images where vulnerable function is present vs. absent.
|
||||
* Emit OpenVEX with correct `not_affected` when patched.
|
||||
|
||||
---
|
||||
|
||||
### Phase 1 – JS/Python/PHP call graphs & basic reachability
|
||||
|
||||
**Duration:** 1–2 sprints
|
||||
|
||||
* Implement:
|
||||
|
||||
* `Scanner.Analyzers.JavaScript` with module + function call graph.
|
||||
* `Scanner.Analyzers.Python` and `Scanner.Analyzers.Php` with basic graphs.
|
||||
* Entrypoint detection:
|
||||
|
||||
* JS: main script from CMD, basic HTTP handlers.
|
||||
* Python: main script + Django/Flask heuristics.
|
||||
* PHP: front controllers.
|
||||
* Implement core reachability algorithm (BFS/DFS).
|
||||
* Implement simple `VulnerabilitySignature` that uses function names and file paths.
|
||||
* Hook lattice engine in Scanner.WebService and integrate with:
|
||||
|
||||
* Concelier vulnerability feeds.
|
||||
* VEXer.
|
||||
|
||||
**Success criteria:**
|
||||
|
||||
* For demo apps (Node, Django, Laravel):
|
||||
|
||||
* Identify vulnerable functions and mark them reachable/unreachable.
|
||||
* Demonstrate noise reduction (some CVEs flagged as `not_affected`).
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 – Binary CFG & Fingerprinting, Improved Confidence
|
||||
|
||||
**Duration:** 1–2 sprints
|
||||
|
||||
* Extend Symbolizer & CFG for:
|
||||
|
||||
* Stripped binaries (function hashing).
|
||||
* Shared libraries (PLT/IAT resolution).
|
||||
* Implement `VulnerabilitySignature.BlockFingerprints` to distinguish patched vs vulnerable binary functions.
|
||||
* Refine confidence scoring:
|
||||
|
||||
* Use fingerprint match quality.
|
||||
* Consider presence/absence of debug info.
|
||||
* Expand coverage:
|
||||
|
||||
* glibc, curl, zlib, OpenSSL, libxml2, etc.
|
||||
|
||||
**Success criteria:**
|
||||
|
||||
* For curated images:
|
||||
|
||||
* Confirm ability to differentiate patched vs vulnerable versions even when binaries are stripped.
|
||||
* Reachability reflects true call paths across app→lib boundaries.
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 – Runtime hooks (optional), UX, and Hardening
|
||||
|
||||
**Duration:** 2+ sprints
|
||||
|
||||
* Add opt-in runtime confirmation:
|
||||
|
||||
* eBPF probes for function hits (Linux).
|
||||
* Map runtime addresses back to `ImageFunction` via symbolization.
|
||||
* Enhance console UX:
|
||||
|
||||
* Path explorer UI: show entrypoint → … → vulnerable function path.
|
||||
* Evidence view with hash-based proofs.
|
||||
* Hardening:
|
||||
|
||||
* Performance optimization for large images (parallel analysis, caching).
|
||||
* Conservative fallbacks for dynamic language features.
|
||||
|
||||
**Success criteria:**
|
||||
|
||||
* For selected environments where runtime is allowed:
|
||||
|
||||
* Static reachability is confirmed by runtime traces in majority of cases.
|
||||
* No significant performance regression on typical images.
|
||||
|
||||
---
|
||||
|
||||
## 12. How this satisfies your initial bullets
|
||||
|
||||
From your initial requirements:
|
||||
|
||||
1. **JavaScript, Python, PHP, binary**
|
||||
→ Dedicated analyzers per language + binary symbolization/CFG, unified in `Scanner.Reachability`.
|
||||
|
||||
2. **Executable containers**
|
||||
→ Analyzers run inside Scanner’s worker container, mounting the target image rootfs; no network access.
|
||||
|
||||
3. **Libraries usage call graph**
|
||||
→ Call graphs map from entrypoints → app code → library functions; SBOM + PURLs tie functions to libraries.
|
||||
|
||||
4. **Reachability analysis**
|
||||
→ BFS/DFS from entrypoints over per-language and binary graphs, with lattice-based merging in `Scanner.WebService`.
|
||||
|
||||
5. **JSON + PURLs**
|
||||
→ All evidence is JSON with PURL-tagged components; SBOM is annotated, and VEX statements reference those PURLs.
|
||||
|
||||
---
|
||||
|
||||
If you like, next step can be: I draft concrete C# interface definitions (including some initial Tree-sitter integration stubs for JS/Python/PHP) and a skeleton of the `ReachabilityPlan` and `ReachabilityEngine` classes that you can drop into the monorepo.
|
||||
Reference in New Issue
Block a user