stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 5b57b04484 house keeping work

2025-12-19 22:19:08 +02:00

12 KiB

Raw Blame History

Executive directive

Build Reachability as Evidence, not as a UI feature.

Every reachability conclusion must produce a portable, signed, replayable evidence bundle that answers:

What vulnerable code unit is being discussed? (symbol/method/function + version)
What entrypoint is assumed? (HTTP handler, RPC method, CLI, scheduled job, etc.)
What is the witness? (a call-path subgraph, not a screenshot)
What assumptions/gates apply? (config flags, feature toggles, runtime wiring)
Can a third party reproduce it? (same inputs → same evidence hash)

This must work for source and post-build artifacts.

Directions for Product Managers

1) Define the product contract in one page

Capability name

Proof‑carrying reachability.

Contract

Given an artifact (source or built) and a vulnerability mapping, Stella Ops outputs:

Reachability verdict: REACHABLE | NOT_PROVEN_REACHABLE | INCONCLUSIVE
Witness evidence: a minimal reachability subgraph + one or more witness paths
Reproducibility bundle: all inputs and toolchain metadata needed to replay
Attestation: signed statement tied to the artifact digest

Important language choice

Avoid claiming “unreachable” unless you can prove non-reachability under a formally sound model.

Use NOT_PROVEN_REACHABLE for “no path found under current analysis + assumptions.”
Use INCONCLUSIVE when analysis cannot be performed reliably (missing symbols, obfuscation, unsupported language, dynamic dispatch uncertainty, etc.).

This is essential for credibility and audit use.

2) Anchor personas and top workflows

Primary personas

Security governance / AppSec: wants fewer false positives and defensible prioritization.
Compliance/audit: wants evidence and replayability.
Engineering teams: wants specific call paths and what to change.

Top workflows (must support in MVP)

CI gate with signed verdict
- “Block release if any REACHABLE high severity is present OR if INCONCLUSIVE exceeds threshold.”
Audit replay
- “Reproduce the reachability proof for artifact digest X using snapshot Y.”
Release delta
- “Show what reachability changed between release A and B.”

3) Minimum viable scope: pick targets that make “post-build” real early

To satisfy “source and post-build artifacts” without biting off ELF-level complexity first:

MVP artifact types (recommended)

Source repository for 1–2 languages with mature static IR
Post-build intermediate artifacts that retain symbol structure:
- Java .jar/.class
- .NET assemblies
- Python wheels (bytecode)
- Node bundles with sourcemaps (optional)

These give you “post-build” support where call graphs are tractable.

Defer for later phases

Native ELF/Mach-O deep reachability (harder due to stripping, inlining, indirect calls, dynamic loading)
Highly dynamic languages without strong type info, unless you accept “witness-only” semantics

Your differentiator is proof portability and determinism, not “supports every binary on day one.”

4) Product requirements: what “proof-carrying” means in requirements language

Functional requirements

Output must include a reachability subgraph:
- Nodes = code units (function/method) with stable IDs
- Edges = call or dispatch edges with type annotations
- Must include at least one witness path from entrypoint to vulnerable node when REACHABLE
Output must be artifact-tied:
- Evidence must reference artifact digest(s) (source commit, build artifact digest, container image digest)
Output must be attestable:
- Produce a signed attestation (DSSE/in-toto style) attached to the artifact digest
Output must be replayable:
- Provide a “replay recipe” (analyzer versions, configs, vulnerability mapping version, and input digests)

Non-functional requirements

Deterministic: repeated runs on same inputs produce identical evidence hash
Size-bounded: subgraph evidence must be bounded (e.g., path-based extraction + limited context)
Privacy-controllable:
- Support a mode that avoids embedding raw source content (store pointers/hashes instead)
Verifiable offline:
- Verification and replay must work air-gapped given the snapshot bundle

5) Acceptance criteria (use as Definition of Done)

A feature is “done” only when:

Verifier can validate the attestation signature and confirm the evidence hash matches content.
A second machine can reproduce the same evidence hash given the replay bundle.
Evidence includes at least one witness path for REACHABLE.
Evidence includes explicit assumptions/gates; absence of gating is recorded as an assumption (e.g., “config unknown”).
Evidence is linked to the precise artifact digest being deployed/scanned.

6) Product packaging decisions that create switching cost

These are product decisions that turn engineering into moat:

Make “reachability proof” an exportable object, not just a UI view.
Provide an API: GET /findings/{id}/proof returning canonical evidence.
Support policy gates on:
- verdict
- confidence
- assumption_count
- inconclusive_reasons
Make “proof replay” a one-command workflow in CLI.

Directions for Development Managers

1) Architecture: build a “proof pipeline” with strict boundaries

Implement as composable modules with stable interfaces:

Artifact Resolver
- Inputs: repo URL/commit, build artifact path, container image digest
- Output: normalized “artifact record” with digests and metadata
Graph Builder (language-specific adapters)
- Inputs: artifact record
- Output: canonical Program Graph
  - Nodes: code units
  - Edges: calls/dispatch
  - Optional: config gates, dependency edges
Vulnerability-to-Code Mapper
- Inputs: vulnerability record (CVE), package coordinates, symbol metadata (if available)
- Output: vulnerable node set + mapping confidence
Entrypoint Modeler
- Inputs: artifact + runtime context (framework detection, routing tables, main methods)
- Output: entrypoint node set with types (HTTP, RPC, CLI, cron)
Reachability Engine
- Inputs: graph + entrypoints + vulnerable nodes + constraints
- Output: witness paths + minimal subgraph extraction
Evidence Canonicalizer
- Inputs: witness paths + subgraph + metadata
- Output: canonical JSON (stable ordering, stable IDs), plus content hash
Attestor
- Inputs: evidence hash + artifact digest
- Output: signed attestation object (OCI attachable)
Verifier (separate component)
- Must validate signatures + evidence integrity independently of generator

Critical: generator and verifier must be decoupled to preserve trust.

2) Evidence model: what to store (and how to keep it stable)

Node identity must be stable across runs

Define a canonical NodeID scheme:

Source node ID:
- {language}:{repo_digest}:{symbol_signature}:{optional_source_location_hash}
Post-build node ID:
- {language}:{artifact_digest}:{symbol_signature}:{optional_offset_or_token}

Avoid raw file paths or non-deterministic compiler offsets as primary IDs unless normalized.

Edge identity

{caller_node_id} -> {callee_node_id} : {edge_type}
Edge types matter (direct call, virtual dispatch, reflection, dynamic import, etc.)

Subgraph extraction rule

Store:

All nodes/edges on at least one witness path (or k witness paths)
Plus bounded context:
- 1–2 hop neighborhood around the vulnerable node and entrypoint
- routing edges (HTTP route → handler) where applicable

This makes the proof compact and audit-friendly.

Canonicalization requirements

Stable sorting of nodes and edges
Canonical JSON serialization (no map-order nondeterminism)
Explicit analyzer version + config included in evidence
Hash everything that influences results

3) Determinism and reproducibility: engineering guardrails

Deterministic computation

Avoid parallel graph traversal that yields nondeterministic order without canonical sorting
If using concurrency, collect results and sort deterministically before emitting

Repro bundle (“time travel”)

Persist, as digests:

Analyzer container/image digest
Analyzer config hash
Vulnerability mapping dataset version hash
Artifact digest(s)
Graph builder version hash

A replay must be possible without “calling home.”

Golden tests

Create fixtures where:

Same input graph + mapping → exact evidence hash
Regression test for canonicalization changes (version the schema intentionally)

4) Attestation format and verification

Attestation contents (minimum)

Subject: artifact digest (image digest / build artifact digest)
Predicate: reachability evidence hash + metadata
Predicate type: reachability (custom) with versioning

Verification requirements

Verification must run offline
It must validate:
1. signature
2. subject digest binding
3. evidence hash matches serialized evidence

Storage model

Use content-addressable storage keyed by evidence hash.
Attestation references the hash; evidence stored separately or embedded (size tradeoff).

5) Source + post-build support: engineering plan

Unifying principle

Both sources produce the same canonical Program Graph abstraction.

Source analyzers produce:

Function/method nodes using language signatures
Edges from static analysis IR

Post-build analyzers produce:

Nodes from bytecode/assembly symbol tables (where available)
Edges from bytecode call instructions / metadata

Practical sequencing (recommended)

Implement one source language adapter (fastest to prove model)
Implement one post-build adapter where symbols are rich (e.g., Java bytecode)
Ensure evidence schema and attestation workflow works identically for both
Expand to more ecosystems once the proof pipeline is stable

6) Operational constraints (performance, size, security)

Performance

Cache program graphs per artifact digest
Cache vulnerability-to-code mapping per package/version
Compute reachability on-demand per vulnerability, but reuse graphs

Evidence size

Limit witness paths (e.g., up to N shortest paths)
Prefer “witness + bounded neighborhood” over exporting full call graph

Security and privacy

Provide a “redacted proof mode”
- include symbol hashes instead of raw names if needed
- store source locations as hashes/pointers
Never embed raw source code unless explicitly enabled

7) Definition of Done for the engineering team

A milestone is complete when you can demonstrate:

Generate a reachability proof for a known vulnerable code unit with a witness path.
Serialize a canonical evidence subgraph and compute a stable hash.
Sign the attestation bound to the artifact digest.
Verify the attestation on a clean machine (offline).
Replay the analysis from the replay bundle and reproduce the same evidence hash.

Concrete artifact example (for alignment)

A reachability evidence object should look structurally like:

subject: artifact digest(s)
claim:
- verdict: REACHABLE / NOT_PROVEN_REACHABLE / INCONCLUSIVE
- entrypoints: list of NodeIDs
- vulnerable_nodes: list of NodeIDs
- witness_paths: list of paths (each path = ordered NodeIDs)
subgraph:
- nodes: list with stable IDs + metadata
- edges: list with stable ordering + edge types
assumptions:
- gating conditions, unresolved dynamic dispatch notes, etc.
tooling:
- analyzer name/version/digest
- config hash
- mapping dataset hash
hashes:
- evidence content hash
- schema version

Then wrap and sign it as an attestation tied to the artifact digest.

The one decision you should force early

Decide (and document) whether your semantics are:

Witness-based (“REACHABLE only if we can produce a witness path”), and
Conservative on negative claims (“NOT_PROVEN_REACHABLE” is not “unreachable”).

This single decision will keep the system honest, reduce legal/audit risk, and prevent the product from drifting into hand-wavy “trust us” scoring.

12 KiB Raw Blame History Unescape Escape