feat: Add new provenance and crypto registry documentation
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages.
- Added a comprehensive crypto registry decision document outlining defaults and required follow-ups.
- Created an offline feeds manifest for bundling air-gap resources.
- Implemented a script to generate and update binary manifests for curated binaries.
- Added a verification script to ensure binary artefacts are located in approved directories.
- Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload.
- Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts.
- Updated vendor manifest to track pinned binaries for integrity.
This commit is contained in:
master
2025-11-18 23:47:13 +02:00
parent d3ecd7f8e6
commit e91da22836
44 changed files with 6793 additions and 99 deletions

View File

@@ -0,0 +1,719 @@
Heres a crisp idea you can drop straight into StellaOps: treat “unknowns” as firstclass data, not noise.
---
# Unknowns Registry — turning uncertainty into signals
**Why:** Scanners and VEX feeds miss things (ambiguous package IDs, unverifiable hashes, orphaned layers, missing SBOM edges, runtime-only artifacts). Today these get logged and forgotten. If we **structure** them, downstream agents can reason about risk and shrink blast radius proactively.
**What it is:** A small service + schema that records every uncertainty with enough context for later inference.
## Core model (v0)
```json
{
"unknown_id": "unk:sha256:…",
"observed_at": "2025-11-18T12:00:00Z",
"provenance": {
"source": "Scanner.Analyzer.DotNet|Sbomer|Signals|Vexer",
"host": "runner-42",
"scan_id": "scan:…"
},
"scope": {
"artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
"subpath": "/app/bin/Contoso.dll",
"phase": "build|scan|runtime"
},
"unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable",
"evidence": {
"raw": "nuget id 'Serilog' but assembly name 'Serilog.Core'",
"signals": ["sym:Serilog.Core.Logger", "procopen:/app/agent"]
},
"transitive": {
"depth": 2,
"parents": ["pkg:nuget/Serilog@?"],
"children": []
},
"confidence": { "p": 0.42, "method": "bayes-merge|rule" },
"exposure_hints": {
"surface": ["logging pipeline", "startup path"],
"runtime_hits": 3
},
"status": "open|triaged|suppressed|resolved",
"labels": ["reachability:possible", "sbom:incomplete"]
}
```
## Categorize by three axes
* **Provenance** (where it came from): Scanner vs Sbomer vs Vexer vs Signals.
* **Scope** (what it touches): image/layer/file/symbol/runtimeproc/policy.
* **Transitive depth** (how far from an entry point): 0 = direct, 1..N via deps.
## How agents use it
* **Cartographer**: includes unknown edges in the graph with special weight; lets Policy/Lattice downrank vulnerable nodes near highimpact unknowns.
* **Remedy Assistant (Zastava)**: proposes microprobes (“add EventPipe/JFR tap for X symbol”) or buildtime assertions (“pin Serilog>=3.1, regenerate SBOM”).
* **Scheduler**: prioritizes scans where unknown density × asset criticality is highest.
## Minimal API (idempotent, additive)
* `POST /unknowns/ingest` — upsert by `unknown_id` (hash of type+scope+evidence).
* `GET /unknowns?artifact=…&status=open` — list for a target.
* `POST /unknowns/:id/triage` — set status/labels, attach rationale.
* `GET /metrics` — density by artifact/namespace/unknown_type.
*All additive; no versioning required. Repeat calls with the same payload are noops.*
## Scoring hook (into your lattice)
* Add a **“Unknowns Pressure”** term:
`risk = base ⊕ (α * density_depth≤1) ⊕ (β * runtime_shadow) ⊕ (γ * policy_undecidable)`
* Gate “green” only if `density_depth≤1 == 0` **or** compensating controls active.
## Storage & plumbing
* **Store:** appendonly KV (Badger/Rocks) + Graph overlay (SQLite/Neo4j—your call).
* **Emit:** DSSEsigned “Unknowns Attestation” per scan for replayable audits.
* **UI:** heatmap per artifact (unknowns by type × depth), drilldown to evidence.
## First 2day slice
1. Define `unknown_type` enum + hashable `unknown_id`.
2. Wire Scanner/Sbomer/Vexer to emit unknowns (start with: identity_gap, missing_edge).
3. Persist + expose `/metrics` (density, by depth and type).
4. In Policy Studio, add the Unknowns Pressure term with default α/β/γ.
If you want, Ill draft the exact protobuf/JSON schema and drop a .NET 10 record types + EF model, plus a tiny CLI to query and a Grafana panel JSON.
I will treat “it” as the whole vision behind **Pushing Binary Reachability Toward True Determinism** inside Stella Ops: function-/symbol-level reachability for binaries and higher-level languages, wired into Scanner, Cartographer, Signals, and VEX.
Below is an implementation-oriented architecture plan you can hand directly to agents.
---
## 1. Scope, goals, and non-negotiable invariants
### 1.1. Scope
Deliver a deterministic reachability pipeline for containers that:
1. Builds **call graphs** and **symbol usage maps** for:
* Native binaries (ELF, PE, Mach-O) — primary for this branch.
* Scripted/VM languages later: JS, Python, PHP (as part of the same architecture).
2. Maps symbols and functions to:
* Packages (purls).
* Vulnerabilities (CVE → symbol/function list via Concelier/VEX data).
3. Computes **deterministic reachability states** for each `(vulnerability, artifact)` pair.
4. Emits:
* Machine-readable JSON (with `purl`s).
* Graph overlays for Cartographer.
* Inputs for the lattice/trust engine and VEXer/Excitor.
### 1.2. Invariants
* **Deterministic replay**: Given the same:
* Image digest(s),
* Analyzer versions,
* Config + policy,
* Runtime trace inputs (if any),
the same reachability outputs must be produced, bit-for-bit.
* **Idempotent, additive APIs**:
* No versioning of endpoints, only additive/optional fields.
* Same request = same response, no side effects besides storing/caching.
* **Lattice logic runs in `Scanner.WebService`**:
* All “reachable/unreachable/unknown” and confidence merging lives in Scanner, not Concelier/Excitors.
* **Preserve prune source**:
* Concelier and Excitors preserve provenance and do not “massage” reachability; they only consume it.
* **Offline, air-gap friendly**:
* No mandatory external calls; dependency on local analyzers and local advisory/VEX cache.
---
## 2. High-level pipeline
From container image to reachability output:
1. **Image enumeration**
`Scanner.WebService` receives an image ref or tarball and spawns an analysis run.
2. **Binary discovery & classification**
Binary analyzers detect ELF/PE/Mach-O + main interpreters (python, node, php) and scripts.
3. **Symbolization & call graph building**
* For each binary/module, we produce:
* Symbol table (exported + imported).
* Call graph edges (function-level where possible).
* For dynamic languages, we later plug in appropriate analyzers.
4. **Symbol→package mapping**
* Match symbols to packages and `purl`s using:
* Known vendor symbol maps (from Concelier / Feedser).
* Heuristics, path patterns, build IDs.
5. **Vulnerability→symbol mapping**
* From Concelier/VEX/CSAF: map each CVE to the set of symbols/functions it affects.
6. **Reachability solving**
* For each `(CVE, artifact)`:
* Determine presence and reachability of affected symbols from known entrypoints.
* Merge static call graph and runtime signals (if available) via deterministic lattice.
7. **Output & storage**
* Reachability JSON with purls and confidence.
* Graph overlay into Cartographer.
* Signals/events for downstream scoring.
* DSSE-signed reachability attestation for replay/audit.
---
## 3. Component architecture
### 3.1. New and extended services
1. **`StellaOps.Scanner.WebService` (extended)**
* Orchestration of reachability analyses.
* Lattice/merging engine.
* Idempotent reachability APIs.
2. **`StellaOps.Scanner.Analyzers.Binary.*` (new)**
* `…Binary.Discovery`: file type detection, ELF/PE/Mach-O parsing.
* `…Binary.Symbolizer`: resolves symbols, imports/exports, relocations.
* `…Binary.CallGraph.Native`: builds call graphs where possible (via disassembly/CFG).
* `…Binary.CallGraph.DynamicStubs`: heuristics for indirect calls, PLT/GOT, vtables.
3. **`StellaOps.Scanner.Analyzers.Script.*` (future extension)**
* `…Lang.JavaScript.CallGraph`
* `…Lang.Python.CallGraph`
* `…Lang.Php.CallGraph`
* These emit the same generic call-graph IR.
4. **`StellaOps.Reachability.Engine` (within Scanner.WebService)**
* Normalizes all call graphs into a common IR.
* Merges static and dynamic evidence.
* Computes reachability states and scores.
5. **`StellaOps.Cartographer.ReachabilityOverlay` (new overlay module)**
* Stores per-artifact call graphs and reachability tags.
* Provides graph queries for UI and policy tools.
6. **`StellaOps.Signals` (extended)**
* Ingests runtime call traces (e.g., from EventPipe/JFR/ebpf in other branches).
* Feeds function-hit events into the Reachability Engine.
7. **Unknowns Registry integration (optional but recommended)**
* Stores unresolved symbol/package mappings and incomplete edges as `unknowns`.
* Used to adjust risk scores (“Unknowns Pressure”) when binary analysis is incomplete.
---
## 4. Detailed design by layer
### 4.1. Static analysis layer (binaries)
#### 4.1.1. Binary discovery
Module: `StellaOps.Scanner.Analyzers.Binary.Discovery`
* Inputs:
* Per-image file list (from existing Scanner).
* Byte slices of candidate binaries.
* Logic:
* Detect ELF/PE/Mach-O via magic bytes, not extensions.
* Classify as:
* Main executable
* Shared library
* Plugin/module
* Output:
* `binary_manifest.json` per image:
```json
{
"image_ref": "registry/app@sha256:…",
"binaries": [
{
"id": "bin:elf:/usr/local/bin/app",
"path": "/usr/local/bin/app",
"format": "elf",
"arch": "x86_64",
"role": "executable"
}
]
}
```
#### 4.1.2. Symbolization
Module: `StellaOps.Scanner.Analyzers.Binary.Symbolizer`
* Uses:
* ELF/PE/Mach-O parsers (internal or third-party), no external calls.
* Output per binary:
```json
{
"binary_id": "bin:elf:/usr/local/bin/app",
"build_id": "buildid:abcd…",
"exports": ["pkg1::ClassA::method1", "..."],
"imports": ["openssl::EVP_EncryptInit_ex", "..."],
"sections": { "text": { "va": "0x...", "size": 12345 } }
}
```
* Writes unresolved symbol sets to Unknowns Registry when:
* Imports cannot be tied to known packages or symbols.
#### 4.1.3. Call graph construction
Module: `StellaOps.Scanner.Analyzers.Binary.CallGraph.Native`
* Core tasks:
* Build control-flow graphs (CFG) for each function via:
* Disassembly.
* Basic block detection.
* Identify direct calls (`call func`) and indirect calls (function pointers, vtables).
* IR model:
```json
{
"binary_id": "bin:elf:/usr/local/bin/app",
"functions": [
{ "fid": "func:app::main", "va": "0x401000", "size": 128 },
{ "fid": "func:libssl::EVP_EncryptInit_ex", "external": true }
],
"edges": [
{ "caller": "func:app::main", "callee": "func:app::init_config", "type": "direct" },
{ "caller": "func:app::main", "callee": "func:libssl::EVP_EncryptInit_ex", "type": "import" }
]
}
```
* Edge confidence:
* `type: direct|import|indirect|heuristic`
* Used later by the lattice.
#### 4.1.4. Entry point inference
* Sources:
* ELF `PT_INTERP`, PE `AddressOfEntryPoint`.
* Application-level hints (known frameworks, service main methods).
* Container metadata (CMD, ENTRYPOINT).
* Output:
```json
{
"binary_id": "bin:elf:/usr/local/bin/app",
"entrypoints": ["func:app::main"]
}
```
> Note: For JS/Python/PHP, equivalent analyzers will later define module entrypoints (`index.js`, `wsgi_app`, `public/index.php`).
---
### 4.2. Symbol-to-package and CVE-to-symbol mapping
#### 4.2.1. Symbol→package mapping
Module: `StellaOps.Reachability.Mapping.SymbolToPurl`
* Inputs:
* Binary symbolization outputs.
* Local mapping DB in Concelier (vendor symbol maps, debug info, name patterns).
* File path + container context (`/usr/lib/...`, `/site-packages/...`).
* Output:
```json
{
"symbol": "libssl::EVP_EncryptInit_ex",
"purl": "pkg:apk/alpine/openssl@3.1.5-r2",
"confidence": 0.93,
"method": "vendor_map+path_heuristic"
}
```
* Unresolved / ambiguous symbols:
* Stored as `unknowns` of type `identity_gap`.
#### 4.2.2. CVE→symbol mapping
Responsibility: Concelier + its advisory ingestion.
* For each vulnerability:
```json
{
"cve_id": "CVE-2025-12345",
"purl": "pkg:apk/alpine/openssl@3.1.5-r2",
"affected_symbols": [
"libssl::EVP_EncryptInit_ex",
"libssl::EVP_EncryptUpdate"
],
"source": "vendor_vex",
"confidence": 1.0
}
```
* Reachability Engine consumes this mapping read-only.
---
### 4.3. Reachability Engine
Module: `StellaOps.Reachability.Engine` (in Scanner.WebService)
#### 4.3.1. Core data model
Per `(artifact, cve, purl)`:
```json
{
"artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
"cve_id": "CVE-2025-12345",
"purl": "pkg:apk/alpine/openssl@3.1.5-r2",
"symbols": [
{
"symbol": "libssl::EVP_EncryptInit_ex",
"static_presence": "present|absent|unknown",
"static_reachability": "reachable|unreachable|unknown",
"runtime_hits": 3,
"runtime_reachability": "observed|not_observed|unknown"
}
],
"reachability_state": "confirmed_reachable|statically_reachable|present_not_reachable|not_present|unknown",
"confidence": {
"p": 0.87,
"evidence": ["static_callgraph", "runtime_trace", "symbol_map"],
"unknowns_pressure": 0.12
}
}
```
#### 4.3.2. Lattice / state machine
Define a deterministic lattice over states:
* `NOT_PRESENT`
* `PRESENT_NOT_REACHABLE`
* `STATICALLY_REACHABLE`
* `RUNTIME_OBSERVED`
And “unknown” flags overlayed when evidence is missing.
Merging rules (simplified):
* If `NOT_PRESENT` and no conflicting evidence → `NOT_PRESENT`.
* If at least one affected symbol is on a static path from any entrypoint → `STATICALLY_REACHABLE`.
* If symbol observed at runtime → `RUNTIME_OBSERVED` (top state).
* If symbol present in binary but not on any static path → `PRESENT_NOT_REACHABLE`, unless unknown edges exist near it (then downgrade with lower confidence).
* Unknowns Registry entries near affected symbols increase `unknowns_pressure` and may push from `NOT_PRESENT` to `UNKNOWN`.
Implementation: pure functional merge functions inside Scanner.WebService:
```csharp
ReachabilityState Merge(ReachabilityState a, ReachabilityState b);
ReachabilityState FromEvidence(StaticEvidence s, RuntimeEvidence r, UnknownsPressure u);
```
#### 4.3.3. Deterministic inputs
To guarantee replay:
* Build **Reachability Plan Manifest** per run:
```json
{
"plan_id": "reach:sha256:…",
"scanner_version": "1.4.0",
"analyzers": {
"binary_discovery": "1.0.0",
"binary_symbolizer": "1.1.0",
"binary_callgraph": "1.2.0"
},
"inputs": {
"image_digest": "sha256:…",
"runtime_trace_files": ["signals:run:2025-11-18T12:00:00Z"],
"config": {
"assume_indirect_calls": "conservative",
"max_call_depth": 10
}
}
}
```
* DSSE-sign the plan + result.
---
### 4.4. Storage and graph overlay
#### 4.4.1. Reachability store
Backend: re-use existing Scanner/Cartographer storage stack (e.g., Postgres or SQLite + blob store).
Tables/collections:
* `reachability_runs`
* `plan_id`, `image_ref`, `created_at`, `scanner_version`.
* `reachability_results`
* `plan_id`, `cve_id`, `purl`, `state`, `confidence_p`, `unknowns_pressure`, `payload_json`.
* Indexes on `(image_ref, cve_id)`, `(image_ref, purl)`.
#### 4.4.2. Cartographer overlay
Edges:
* `IMAGE` → `BINARY` → `FUNCTION` → `PACKAGE` → `CVE`
* Extra property on `IMAGE -[AFFECTED_BY]-> CVE`:
* `reachability_state`
* `reachability_plan_id`
Enables queries:
* “Show me all CVEs with `STATICALLY_REACHABLE` in this namespace.”
* “Show me binaries with high density of reachable crypto CVEs.”
---
### 4.5. APIs (idempotent, additive)
#### 4.5.1. Trigger reachability
`POST /reachability/runs`
Request:
```json
{
"artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
"config": {
"include_languages": ["binary"],
"max_call_depth": 10,
"assume_indirect_calls": "conservative"
}
}
```
Response:
```json
{ "plan_id": "reach:sha256:…" }
```
* Idempotent key: `(image_ref, config_hash)`. Subsequent calls return same `plan_id`.
#### 4.5.2. Fetch results
`GET /reachability/runs/:plan_id`
```json
{
"plan": { /* reachability plan manifest */ },
"results": [
{
"cve_id": "CVE-2025-12345",
"purl": "pkg:apk/alpine/openssl@3.1.5-r2",
"reachability_state": "static_reachable",
"confidence": { "p": 0.84, "unknowns_pressure": 0.1 }
}
]
}
```
#### 4.5.3. Per-CVE view for VEXer/Excitor
`GET /reachability/by-cve?artifact=…&cve_id=…`
* Returns filtered result for downstream VEX creation.
All APIs are **read-only** except for the side effect of storing/caching runs.
---
## 5. Interaction with other Stella Ops modules
### 5.1. Concelier
* Provides:
* CVE→purl→symbol mapping.
* Vendor VEX statements indicating affected functions.
* Consumes:
* Nothing from reachability directly; Scanner/WebService passes reachability summary to VEXer/Excitor which merges with vendor statements.
### 5.2. VEXer / Excitor
* Input:
* For each `(artifact, cve)`:
* Reachability state.
* Confidence.
* Logic:
* Translate states to VEX statements:
* `NOT_PRESENT` → `not_affected`
* `PRESENT_NOT_REACHABLE` → `not_affected` (with justification “code not reachable according to analysis”)
* `STATICALLY_REACHABLE` → `affected`
* `RUNTIME_OBSERVED` → `affected` (higher severity)
* Attach determinism proof:
* Plan ID + DSSE of reachability run.
### 5.3. Signals
* Provides:
* Function hit events: `(binary_id, function_id, timestamp)` aggregated per image.
* Reachability Engine:
* Marks `runtime_hits` and state `RUNTIME_OBSERVED` for symbols with hits.
* Unknowns:
* If runtime sees hits in functions with no static edges to entrypoints (or unmapped symbols), these produce Unknowns and increase `unknowns_pressure`.
### 5.4. Unknowns Registry
* From reachability pipeline, create Unknowns when:
* Symbol→package mapping is ambiguous.
* CVE→symbol mapping exists, but symbol cannot be found in binaries.
* Call graph has indirect calls that cannot be resolved.
* The “Unknowns Pressure” term is fed into:
* Reachability confidence.
* Global risk scoring (Trust Algebra Studio).
---
## 6. Implementation phases and engineering plan
### Phase 0 Scaffolding & manifests (1 sprint)
* Create:
* `StellaOps.Reachability.Engine` skeleton.
* Reachability Plan Manifest schema.
* Reachability Run + Result persistence.
* Add `/reachability/runs` and `/reachability/runs/:plan_id` endpoints, returning mock data.
* Wire DSSE attestation generation for reachability results (even if payload is empty).
### Phase 1 Binary discovery + symbolization (12 sprints)
* Implement `Binary.Discovery` and `Binary.Symbolizer`.
* Feed symbol tables into Reachability Engine as “presence-only evidence”:
* States: `NOT_PRESENT` vs `PRESENT_NOT_REACHABLE` vs `UNKNOWN`.
* Integrate with Conceliers CVE→purl mapping (no symbol-level yet):
* For CVEs affecting a package present in the image, mark as `PRESENT_NOT_REACHABLE`.
* Emit Unknowns for unresolved binary roles and ambiguous package mapping.
Deliverable: package-level reachability with deterministic manifests.
### Phase 2 Binary call graphs & entrypoints (23 sprints)
* Implement `Binary.CallGraph.Native`:
* CFG + direct call edges.
* Implement entrypoint inference from binary + container ENTRYPOINT/CMD.
* Add static reachability algorithm:
* DFS/BFS from entrypoints through call graph.
* Mark affected symbols as reachable if found on paths.
* Extend Concelier to ingest symbol-aware vulnerability metadata (for pilots; can be partial).
Deliverable: function-level static reachability for native binaries where symbol maps exist.
### Phase 3 Runtime integration (2 sprints, may be in parallel workstream)
* Integrate Signals runtime evidence:
* Define schema for function hit events.
* Add ingestion path into Reachability Engine.
* Update lattice:
* Promote symbols to `RUNTIME_OBSERVED` when hits exist.
* Extend DSSE attestation to reference runtime evidence URIs (hashes of trace inputs).
Deliverable: static + runtime-confirmed reachability.
### Phase 4 Unknowns & pressure (1 sprint)
* Wire Unknowns Registry:
* Emit unknowns from Symbolizer and CallGraph (identity gaps, missing edges).
* Compute `unknowns_pressure` per `(artifact, cve)` as density of unknowns near affected symbols.
* Adjust confidence calculation in Reachability Engine.
* Expose unknowns metrics in API and Cartographer.
Deliverable: explicit modelling of uncertainty, feeding into trust/lattice.
### Phase 5 Language extensions (JS/Python/PHP) (ongoing)
* Implement per-language call-graph analyzers creating the same IR as binary.
* Extend symbol→purl mapping for these ecosystems (npm, PyPI, Packagist).
* Update reachability solver to include multi-language edges (e.g., Python calling into native modules).
---
## 7. Minimal contracts for agents
To hand off to agents, you can codify:
1. **IR schemas**
* Call graph IR.
* Reachability Result JSON.
* Reachability Plan Manifest.
2. **API contracts**
* `POST /reachability/runs`
* `GET /reachability/runs/:plan_id`
* `GET /reachability/by-cve`
3. **Module boundaries**
* `Scanner.Analyzers.Binary.*` produce IR only; NO network calls.
* `Reachability.Engine` is the only place where lattice logic lives.
* `Concelier` is read-only for reachability; no custom logic there.
4. **Determinism practices**
* All algorithmic randomness is banned; where unavoidable, seed with values derived from plan_id.
* All external inputs must be listed in the Plan Manifest.
If you like, next step I can draft:
* Concrete C# record types for the IRs.
* A small pseudo-code implementation of the lattice functions and static reachability DFS.
* A proposed directory layout under `src/StellaOps.Scanner` and `src/StellaOps.Cartographer`.