stella-ops.org/git.stella-ops.org

Fork 0

Files

master e91da22836

api-governance / spectral-lint (push) Has been cancelled

Details

Docs CI / lint-and-preview (push) Has been cancelled

Details

feat: Add new provenance and crypto registry documentation

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages.
- Added a comprehensive crypto registry decision document outlining defaults and required follow-ups.
- Created an offline feeds manifest for bundling air-gap resources.
- Implemented a script to generate and update binary manifests for curated binaries.
- Added a verification script to ensure binary artefacts are located in approved directories.
- Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload.
- Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts.
- Updated vendor manifest to track pinned binaries for integrity.

2025-11-18 23:47:13 +02:00

22 KiB

Raw Blame History

Here’s a crisp idea you can drop straight into Stella Ops: treat “unknowns” as first‑class data, not noise.

Unknowns Registry — turning uncertainty into signals

Why: Scanners and VEX feeds miss things (ambiguous package IDs, unverifiable hashes, orphaned layers, missing SBOM edges, runtime-only artifacts). Today these get logged and forgotten. If we structure them, downstream agents can reason about risk and shrink blast radius proactively.

What it is: A small service + schema that records every uncertainty with enough context for later inference.

Core model (v0)

{
  "unknown_id": "unk:sha256:…",
  "observed_at": "2025-11-18T12:00:00Z",
  "provenance": {
    "source": "Scanner.Analyzer.DotNet|Sbomer|Signals|Vexer",
    "host": "runner-42",
    "scan_id": "scan:…"
  },
  "scope": {
    "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
    "subpath": "/app/bin/Contoso.dll",
    "phase": "build|scan|runtime"
  },
  "unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable",
  "evidence": {
    "raw": "nuget id 'Serilog' but assembly name 'Serilog.Core'",
    "signals": ["sym:Serilog.Core.Logger", "procopen:/app/agent"]
  },
  "transitive": {
    "depth": 2,
    "parents": ["pkg:nuget/Serilog@?"],
    "children": []
  },
  "confidence": { "p": 0.42, "method": "bayes-merge|rule" },
  "exposure_hints": {
    "surface": ["logging pipeline", "startup path"],
    "runtime_hits": 3
  },
  "status": "open|triaged|suppressed|resolved",
  "labels": ["reachability:possible", "sbom:incomplete"]
}

Categorize by three axes

Provenance (where it came from): Scanner vs Sbomer vs Vexer vs Signals.
Scope (what it touches): image/layer/file/symbol/runtime‑proc/policy.
Transitive depth (how far from an entry point): 0 = direct, 1..N via deps.

How agents use it

Cartographer: includes unknown edges in the graph with special weight; lets Policy/Lattice down‑rank vulnerable nodes near high‑impact unknowns.
Remedy Assistant (Zastava): proposes micro‑probes (“add EventPipe/JFR tap for X symbol”) or build‑time assertions (“pin Serilog>=3.1, regenerate SBOM”).
Scheduler: prioritizes scans where unknown density × asset criticality is highest.

Minimal API (idempotent, additive)

POST /unknowns/ingest — upsert by unknown_id (hash of type+scope+evidence).
GET /unknowns?artifact=…&status=open — list for a target.
POST /unknowns/:id/triage — set status/labels, attach rationale.
GET /metrics — density by artifact/namespace/unknown_type.

All additive; no versioning required. Repeat calls with the same payload are no‑ops.

Scoring hook (into your lattice)

Add a “Unknowns Pressure” term: risk = base ⊕ (α * density_depth≤1) ⊕ (β * runtime_shadow) ⊕ (γ * policy_undecidable)
Gate “green” only if density_depth≤1 == 0 or compensating controls active.

Storage & plumbing

Store: append‑only KV (Badger/Rocks) + Graph overlay (SQLite/Neo4j—your call).
Emit: DSSE‑signed “Unknowns Attestation” per scan for replayable audits.
UI: heatmap per artifact (unknowns by type × depth), drill‑down to evidence.

First 2‑day slice

Define unknown_type enum + hashable unknown_id.
Wire Scanner/Sbomer/Vexer to emit unknowns (start with: identity_gap, missing_edge).
Persist + expose /metrics (density, by depth and type).
In Policy Studio, add the Unknowns Pressure term with default α/β/γ.

If you want, I’ll draft the exact protobuf/JSON schema and drop a .NET 10 record types + EF model, plus a tiny CLI to query and a Grafana panel JSON. I will treat “it” as the whole vision behind Pushing Binary Reachability Toward True Determinism inside Stella Ops: function-/symbol-level reachability for binaries and higher-level languages, wired into Scanner, Cartographer, Signals, and VEX.

Below is an implementation-oriented architecture plan you can hand directly to agents.

1. Scope, goals, and non-negotiable invariants

1.1. Scope

Deliver a deterministic reachability pipeline for containers that:

Builds call graphs and symbol usage maps for:
- Native binaries (ELF, PE, Mach-O) — primary for this branch.
- Scripted/VM languages later: JS, Python, PHP (as part of the same architecture).
Maps symbols and functions to:
- Packages (purls).
- Vulnerabilities (CVE → symbol/function list via Concelier/VEX data).
Computes deterministic reachability states for each (vulnerability, artifact) pair.
Emits:
- Machine-readable JSON (with purls).
- Graph overlays for Cartographer.
- Inputs for the lattice/trust engine and VEXer/Excitor.

1.2. Invariants

Deterministic replay: Given the same:
- Image digest(s),
- Analyzer versions,
- Config + policy,
- Runtime trace inputs (if any), the same reachability outputs must be produced, bit-for-bit.
Idempotent, additive APIs:
- No versioning of endpoints, only additive/optional fields.
- Same request = same response, no side effects besides storing/caching.
Lattice logic runs in Scanner.WebService:
- All “reachable/unreachable/unknown” and confidence merging lives in Scanner, not Concelier/Excitors.
Preserve prune source:
- Concelier and Excitors preserve provenance and do not “massage” reachability; they only consume it.
Offline, air-gap friendly:
- No mandatory external calls; dependency on local analyzers and local advisory/VEX cache.

2. High-level pipeline

From container image to reachability output:

Image enumeration Scanner.WebService receives an image ref or tarball and spawns an analysis run.
Binary discovery & classification Binary analyzers detect ELF/PE/Mach-O + main interpreters (python, node, php) and scripts.
Symbolization & call graph building
- For each binary/module, we produce:
  - Symbol table (exported + imported).
  - Call graph edges (function-level where possible).
- For dynamic languages, we later plug in appropriate analyzers.
Symbol→package mapping
- Match symbols to packages and purls using:
  - Known vendor symbol maps (from Concelier / Feedser).
  - Heuristics, path patterns, build IDs.
Vulnerability→symbol mapping
- From Concelier/VEX/CSAF: map each CVE to the set of symbols/functions it affects.
Reachability solving
- For each (CVE, artifact):
  - Determine presence and reachability of affected symbols from known entrypoints.
  - Merge static call graph and runtime signals (if available) via deterministic lattice.
Output & storage
- Reachability JSON with purls and confidence.
- Graph overlay into Cartographer.
- Signals/events for downstream scoring.
- DSSE-signed reachability attestation for replay/audit.

3. Component architecture

3.1. New and extended services

StellaOps.Scanner.WebService (extended)
- Orchestration of reachability analyses.
- Lattice/merging engine.
- Idempotent reachability APIs.
StellaOps.Scanner.Analyzers.Binary.* (new)
- …Binary.Discovery: file type detection, ELF/PE/Mach-O parsing.
- …Binary.Symbolizer: resolves symbols, imports/exports, relocations.
- …Binary.CallGraph.Native: builds call graphs where possible (via disassembly/CFG).
- …Binary.CallGraph.DynamicStubs: heuristics for indirect calls, PLT/GOT, vtables.
StellaOps.Scanner.Analyzers.Script.* (future extension)
- …Lang.JavaScript.CallGraph
- …Lang.Python.CallGraph
- …Lang.Php.CallGraph
- These emit the same generic call-graph IR.
StellaOps.Reachability.Engine (within Scanner.WebService)
- Normalizes all call graphs into a common IR.
- Merges static and dynamic evidence.
- Computes reachability states and scores.
StellaOps.Cartographer.ReachabilityOverlay (new overlay module)
- Stores per-artifact call graphs and reachability tags.
- Provides graph queries for UI and policy tools.
StellaOps.Signals (extended)
- Ingests runtime call traces (e.g., from EventPipe/JFR/ebpf in other branches).
- Feeds function-hit events into the Reachability Engine.
Unknowns Registry integration (optional but recommended)
- Stores unresolved symbol/package mappings and incomplete edges as unknowns.
- Used to adjust risk scores (“Unknowns Pressure”) when binary analysis is incomplete.

4. Detailed design by layer

4.1. Static analysis layer (binaries)

4.1.1. Binary discovery

Module: StellaOps.Scanner.Analyzers.Binary.Discovery

Inputs:
- Per-image file list (from existing Scanner).
- Byte slices of candidate binaries.
Logic:
- Detect ELF/PE/Mach-O via magic bytes, not extensions.
- Classify as:
  - Main executable
  - Shared library
  - Plugin/module

Output:

binary_manifest.json per image:

{
  "image_ref": "registry/app@sha256:…",
  "binaries": [
    {
      "id": "bin:elf:/usr/local/bin/app",
      "path": "/usr/local/bin/app",
      "format": "elf",
      "arch": "x86_64",
      "role": "executable"
    }
  ]
}

4.1.2. Symbolization

Module: StellaOps.Scanner.Analyzers.Binary.Symbolizer

Uses:
- ELF/PE/Mach-O parsers (internal or third-party), no external calls.

Output per binary:

{
  "binary_id": "bin:elf:/usr/local/bin/app",
  "build_id": "buildid:abcd…",
  "exports": ["pkg1::ClassA::method1", "..."],
  "imports": ["openssl::EVP_EncryptInit_ex", "..."],
  "sections": { "text": { "va": "0x...", "size": 12345 } }
}

Writes unresolved symbol sets to Unknowns Registry when:
- Imports cannot be tied to known packages or symbols.

4.1.3. Call graph construction

Module: StellaOps.Scanner.Analyzers.Binary.CallGraph.Native

Core tasks:
- Build control-flow graphs (CFG) for each function via:
  - Disassembly.
  - Basic block detection.
- Identify direct calls (call func) and indirect calls (function pointers, vtables).

IR model:

{
  "binary_id": "bin:elf:/usr/local/bin/app",
  "functions": [
    { "fid": "func:app::main", "va": "0x401000", "size": 128 },
    { "fid": "func:libssl::EVP_EncryptInit_ex", "external": true }
  ],
  "edges": [
    { "caller": "func:app::main", "callee": "func:app::init_config", "type": "direct" },
    { "caller": "func:app::main", "callee": "func:libssl::EVP_EncryptInit_ex", "type": "import" }
  ]
}

Edge confidence:
- type: direct|import|indirect|heuristic
- Used later by the lattice.

4.1.4. Entry point inference

Sources:
- ELF PT_INTERP, PE AddressOfEntryPoint.
- Application-level hints (known frameworks, service main methods).
- Container metadata (CMD, ENTRYPOINT).

Output:

{
  "binary_id": "bin:elf:/usr/local/bin/app",
  "entrypoints": ["func:app::main"]
}

Note: For JS/Python/PHP, equivalent analyzers will later define module entrypoints (index.js, wsgi_app, public/index.php).

4.2. Symbol-to-package and CVE-to-symbol mapping

4.2.1. Symbol→package mapping

Module: StellaOps.Reachability.Mapping.SymbolToPurl

Inputs:
- Binary symbolization outputs.
- Local mapping DB in Concelier (vendor symbol maps, debug info, name patterns).
- File path + container context (/usr/lib/..., /site-packages/...).

Output:

{
  "symbol": "libssl::EVP_EncryptInit_ex",
  "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
  "confidence": 0.93,
  "method": "vendor_map+path_heuristic"
}

Unresolved / ambiguous symbols:
- Stored as unknowns of type identity_gap.

4.2.2. CVE→symbol mapping

Responsibility: Concelier + its advisory ingestion.

For each vulnerability:

{
  "cve_id": "CVE-2025-12345",
  "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
  "affected_symbols": [
    "libssl::EVP_EncryptInit_ex",
    "libssl::EVP_EncryptUpdate"
  ],
  "source": "vendor_vex",
  "confidence": 1.0
}

Reachability Engine consumes this mapping read-only.

4.3. Reachability Engine

Module: StellaOps.Reachability.Engine (in Scanner.WebService)

4.3.1. Core data model

Per (artifact, cve, purl):

{
  "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
  "cve_id": "CVE-2025-12345",
  "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
  "symbols": [
    {
      "symbol": "libssl::EVP_EncryptInit_ex",
      "static_presence": "present|absent|unknown",
      "static_reachability": "reachable|unreachable|unknown",
      "runtime_hits": 3,
      "runtime_reachability": "observed|not_observed|unknown"
    }
  ],
  "reachability_state": "confirmed_reachable|statically_reachable|present_not_reachable|not_present|unknown",
  "confidence": {
    "p": 0.87,
    "evidence": ["static_callgraph", "runtime_trace", "symbol_map"],
    "unknowns_pressure": 0.12
  }
}

4.3.2. Lattice / state machine

Define a deterministic lattice over states:

NOT_PRESENT
PRESENT_NOT_REACHABLE
STATICALLY_REACHABLE
RUNTIME_OBSERVED

And “unknown” flags overlayed when evidence is missing.

Merging rules (simplified):

If NOT_PRESENT and no conflicting evidence → NOT_PRESENT.
If at least one affected symbol is on a static path from any entrypoint → STATICALLY_REACHABLE.
If symbol observed at runtime → RUNTIME_OBSERVED (top state).
If symbol present in binary but not on any static path → PRESENT_NOT_REACHABLE, unless unknown edges exist near it (then downgrade with lower confidence).
Unknowns Registry entries near affected symbols increase unknowns_pressure and may push from NOT_PRESENT to UNKNOWN.

Implementation: pure functional merge functions inside Scanner.WebService:

ReachabilityState Merge(ReachabilityState a, ReachabilityState b);
ReachabilityState FromEvidence(StaticEvidence s, RuntimeEvidence r, UnknownsPressure u);

4.3.3. Deterministic inputs

To guarantee replay:

Build Reachability Plan Manifest per run:

{
  "plan_id": "reach:sha256:…",
  "scanner_version": "1.4.0",
  "analyzers": {
    "binary_discovery": "1.0.0",
    "binary_symbolizer": "1.1.0",
    "binary_callgraph": "1.2.0"
  },
  "inputs": {
    "image_digest": "sha256:…",
    "runtime_trace_files": ["signals:run:2025-11-18T12:00:00Z"],
    "config": {
      "assume_indirect_calls": "conservative",
      "max_call_depth": 10
    }
  }
}

DSSE-sign the plan + result.

4.4. Storage and graph overlay

4.4.1. Reachability store

Backend: re-use existing Scanner/Cartographer storage stack (e.g., Postgres or SQLite + blob store).

Tables/collections:

reachability_runs
- plan_id, image_ref, created_at, scanner_version.
reachability_results
- plan_id, cve_id, purl, state, confidence_p, unknowns_pressure, payload_json.
Indexes on (image_ref, cve_id), (image_ref, purl).

4.4.2. Cartographer overlay

Edges:

IMAGE → BINARY → FUNCTION → PACKAGE → CVE
Extra property on IMAGE -[AFFECTED_BY]-> CVE:
- reachability_state
- reachability_plan_id

Enables queries:

“Show me all CVEs with STATICALLY_REACHABLE in this namespace.”
“Show me binaries with high density of reachable crypto CVEs.”

4.5. APIs (idempotent, additive)

4.5.1. Trigger reachability

POST /reachability/runs

Request:

{
  "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" },
  "config": {
    "include_languages": ["binary"],
    "max_call_depth": 10,
    "assume_indirect_calls": "conservative"
  }
}

Response:

{ "plan_id": "reach:sha256:…" }

Idempotent key: (image_ref, config_hash). Subsequent calls return same plan_id.

4.5.2. Fetch results

GET /reachability/runs/:plan_id

{
  "plan": { /* reachability plan manifest */ },
  "results": [
    {
      "cve_id": "CVE-2025-12345",
      "purl": "pkg:apk/alpine/openssl@3.1.5-r2",
      "reachability_state": "static_reachable",
      "confidence": { "p": 0.84, "unknowns_pressure": 0.1 }
    }
  ]
}

4.5.3. Per-CVE view for VEXer/Excitor

GET /reachability/by-cve?artifact=…&cve_id=…

Returns filtered result for downstream VEX creation.

All APIs are read-only except for the side effect of storing/caching runs.

5. Interaction with other Stella Ops modules

5.1. Concelier

Provides:
- CVE→purl→symbol mapping.
- Vendor VEX statements indicating affected functions.
Consumes:
- Nothing from reachability directly; Scanner/WebService passes reachability summary to VEXer/Excitor which merges with vendor statements.

5.2. VEXer / Excitor

Input:
- For each (artifact, cve):
  - Reachability state.
  - Confidence.
Logic:
- Translate states to VEX statements:
  - NOT_PRESENT → not_affected
  - PRESENT_NOT_REACHABLE → not_affected (with justification “code not reachable according to analysis”)
  - STATICALLY_REACHABLE → affected
  - RUNTIME_OBSERVED → affected (higher severity)
- Attach determinism proof:
  - Plan ID + DSSE of reachability run.

5.3. Signals

Provides:
- Function hit events: (binary_id, function_id, timestamp) aggregated per image.
Reachability Engine:
- Marks runtime_hits and state RUNTIME_OBSERVED for symbols with hits.
Unknowns:
- If runtime sees hits in functions with no static edges to entrypoints (or unmapped symbols), these produce Unknowns and increase unknowns_pressure.

5.4. Unknowns Registry

From reachability pipeline, create Unknowns when:
- Symbol→package mapping is ambiguous.
- CVE→symbol mapping exists, but symbol cannot be found in binaries.
- Call graph has indirect calls that cannot be resolved.
The “Unknowns Pressure” term is fed into:
- Reachability confidence.
- Global risk scoring (Trust Algebra Studio).

6. Implementation phases and engineering plan

Phase 0 – Scaffolding & manifests (1 sprint)

Create:
- StellaOps.Reachability.Engine skeleton.
- Reachability Plan Manifest schema.
- Reachability Run + Result persistence.
Add /reachability/runs and /reachability/runs/:plan_id endpoints, returning mock data.
Wire DSSE attestation generation for reachability results (even if payload is empty).

Phase 1 – Binary discovery + symbolization (1–2 sprints)

Implement Binary.Discovery and Binary.Symbolizer.
Feed symbol tables into Reachability Engine as “presence-only evidence”:
- States: NOT_PRESENT vs PRESENT_NOT_REACHABLE vs UNKNOWN.
Integrate with Concelier’s CVE→purl mapping (no symbol-level yet):
- For CVEs affecting a package present in the image, mark as PRESENT_NOT_REACHABLE.
Emit Unknowns for unresolved binary roles and ambiguous package mapping.

Deliverable: package-level reachability with deterministic manifests.

Phase 2 – Binary call graphs & entrypoints (2–3 sprints)

Implement Binary.CallGraph.Native:
- CFG + direct call edges.
Implement entrypoint inference from binary + container ENTRYPOINT/CMD.
Add static reachability algorithm:
- DFS/BFS from entrypoints through call graph.
- Mark affected symbols as reachable if found on paths.
Extend Concelier to ingest symbol-aware vulnerability metadata (for pilots; can be partial).

Deliverable: function-level static reachability for native binaries where symbol maps exist.

Phase 3 – Runtime integration (2 sprints, may be in parallel workstream)

Integrate Signals runtime evidence:
- Define schema for function hit events.
- Add ingestion path into Reachability Engine.
Update lattice:
- Promote symbols to RUNTIME_OBSERVED when hits exist.
Extend DSSE attestation to reference runtime evidence URIs (hashes of trace inputs).

Deliverable: static + runtime-confirmed reachability.

Phase 4 – Unknowns & pressure (1 sprint)

Wire Unknowns Registry:
- Emit unknowns from Symbolizer and CallGraph (identity gaps, missing edges).
- Compute unknowns_pressure per (artifact, cve) as density of unknowns near affected symbols.
Adjust confidence calculation in Reachability Engine.
Expose unknowns metrics in API and Cartographer.

Deliverable: explicit modelling of uncertainty, feeding into trust/lattice.

Phase 5 – Language extensions (JS/Python/PHP) (ongoing)

Implement per-language call-graph analyzers creating the same IR as binary.
Extend symbol→purl mapping for these ecosystems (npm, PyPI, Packagist).
Update reachability solver to include multi-language edges (e.g., Python calling into native modules).

7. Minimal contracts for agents

To hand off to agents, you can codify:

IR schemas
- Call graph IR.
- Reachability Result JSON.
- Reachability Plan Manifest.
API contracts
- POST /reachability/runs
- GET /reachability/runs/:plan_id
- GET /reachability/by-cve
Module boundaries
- Scanner.Analyzers.Binary.* produce IR only; NO network calls.
- Reachability.Engine is the only place where lattice logic lives.
- Concelier is read-only for reachability; no custom logic there.
Determinism practices
- All algorithmic randomness is banned; where unavoidable, seed with values derived from plan_id.
- All external inputs must be listed in the Plan Manifest.

If you like, next step I can draft:

Concrete C# record types for the IRs.
A small pseudo-code implementation of the lattice functions and static reachability DFS.
A proposed directory layout under src/StellaOps.Scanner and src/StellaOps.Cartographer.

22 KiB Raw Blame History Unescape Escape

Unknowns Registry — turning uncertainty into signals

Core model (v0)

Categorize by three axes

How agents use it

Minimal API (idempotent, additive)

Scoring hook (into your lattice)

Storage & plumbing

First 2‑day slice

1. Scope, goals, and non-negotiable invariants

1.1. Scope

1.2. Invariants

2. High-level pipeline

3. Component architecture

3.1. New and extended services

4. Detailed design by layer

4.1. Static analysis layer (binaries)

4.1.1. Binary discovery

4.1.2. Symbolization

4.1.3. Call graph construction

4.1.4. Entry point inference

4.2. Symbol-to-package and CVE-to-symbol mapping

4.2.1. Symbol→package mapping

4.2.2. CVE→symbol mapping

4.3. Reachability Engine

4.3.1. Core data model

4.3.2. Lattice / state machine

4.3.3. Deterministic inputs

4.4. Storage and graph overlay

4.4.1. Reachability store

4.4.2. Cartographer overlay

4.5. APIs (idempotent, additive)

4.5.1. Trigger reachability

4.5.2. Fetch results

4.5.3. Per-CVE view for VEXer/Excitor

5. Interaction with other Stella Ops modules

5.1. Concelier

5.2. VEXer / Excitor

5.3. Signals

5.4. Unknowns Registry

6. Implementation phases and engineering plan

Phase 0 – Scaffolding & manifests (1 sprint)

Phase 1 – Binary discovery + symbolization (1–2 sprints)

Phase 2 – Binary call graphs & entrypoints (2–3 sprints)

Phase 3 – Runtime integration (2 sprints, may be in parallel workstream)

Phase 4 – Unknowns & pressure (1 sprint)

Phase 5 – Language extensions (JS/Python/PHP) (ongoing)

7. Minimal contracts for agents

22 KiB

Raw Blame History