house keeping work
This commit is contained in:
338
docs/product-advisories/unprocessed/19-Dec-2025 - Moat #4.md
Normal file
338
docs/product-advisories/unprocessed/19-Dec-2025 - Moat #4.md
Normal file
@@ -0,0 +1,338 @@
|
||||
## Executive directive
|
||||
|
||||
Build **Reachability as Evidence**, not as a UI feature.
|
||||
|
||||
Every reachability conclusion must produce a **portable, signed, replayable evidence bundle** that answers:
|
||||
|
||||
1. **What vulnerable code unit is being discussed?** (symbol/method/function + version)
|
||||
2. **What entrypoint is assumed?** (HTTP handler, RPC method, CLI, scheduled job, etc.)
|
||||
3. **What is the witness?** (a call-path subgraph, not a screenshot)
|
||||
4. **What assumptions/gates apply?** (config flags, feature toggles, runtime wiring)
|
||||
5. **Can a third party reproduce it?** (same inputs → same evidence hash)
|
||||
|
||||
This must work for **source** and **post-build artifacts**.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Product Managers
|
||||
|
||||
## 1) Define the product contract in one page
|
||||
|
||||
### Capability name
|
||||
**Proof‑carrying reachability**.
|
||||
|
||||
### Contract
|
||||
Given an artifact (source or built) and a vulnerability mapping, Stella Ops outputs:
|
||||
|
||||
- **Reachability verdict:** `REACHABLE | NOT_PROVEN_REACHABLE | INCONCLUSIVE`
|
||||
- **Witness evidence:** a minimal **reachability subgraph** + one or more witness paths
|
||||
- **Reproducibility bundle:** all inputs and toolchain metadata needed to replay
|
||||
- **Attestation:** signed statement tied to the artifact digest
|
||||
|
||||
### Important language choice
|
||||
Avoid claiming “unreachable” unless you can prove non-reachability under a formally sound model.
|
||||
|
||||
- Use **NOT_PROVEN_REACHABLE** for “no path found under current analysis + assumptions.”
|
||||
- Use **INCONCLUSIVE** when analysis cannot be performed reliably (missing symbols, obfuscation, unsupported language, dynamic dispatch uncertainty, etc.).
|
||||
|
||||
This is essential for credibility and audit use.
|
||||
|
||||
---
|
||||
|
||||
## 2) Anchor personas and top workflows
|
||||
|
||||
### Primary personas
|
||||
- Security governance / AppSec: wants fewer false positives and defensible prioritization.
|
||||
- Compliance/audit: wants evidence and replayability.
|
||||
- Engineering teams: wants specific call paths and what to change.
|
||||
|
||||
### Top workflows (must support in MVP)
|
||||
1. **CI gate with signed verdict**
|
||||
- “Block release if any `REACHABLE` high severity is present OR if `INCONCLUSIVE` exceeds threshold.”
|
||||
2. **Audit replay**
|
||||
- “Reproduce the reachability proof for artifact digest X using snapshot Y.”
|
||||
3. **Release delta**
|
||||
- “Show what reachability changed between release A and B.”
|
||||
|
||||
---
|
||||
|
||||
## 3) Minimum viable scope: pick targets that make “post-build” real early
|
||||
|
||||
To satisfy “source and post-build artifacts” without biting off ELF-level complexity first:
|
||||
|
||||
### MVP artifact types (recommended)
|
||||
- **Source repository** for 1–2 languages with mature static IR
|
||||
- **Post-build intermediate artifacts** that retain symbol structure:
|
||||
- Java `.jar/.class`
|
||||
- .NET assemblies
|
||||
- Python wheels (bytecode)
|
||||
- Node bundles with sourcemaps (optional)
|
||||
|
||||
These give you “post-build” support where call graphs are tractable.
|
||||
|
||||
### Defer for later phases
|
||||
- Native ELF/Mach-O deep reachability (harder due to stripping, inlining, indirect calls, dynamic loading)
|
||||
- Highly dynamic languages without strong type info, unless you accept “witness-only” semantics
|
||||
|
||||
Your differentiator is proof portability and determinism, not “supports every binary on day one.”
|
||||
|
||||
---
|
||||
|
||||
## 4) Product requirements: what “proof-carrying” means in requirements language
|
||||
|
||||
### Functional requirements
|
||||
- Output must include a **reachability subgraph**:
|
||||
- Nodes = code units (function/method) with stable IDs
|
||||
- Edges = call or dispatch edges with type annotations
|
||||
- Must include at least one **witness path** from entrypoint to vulnerable node when `REACHABLE`
|
||||
- Output must be **artifact-tied**:
|
||||
- Evidence must reference artifact digest(s) (source commit, build artifact digest, container image digest)
|
||||
- Output must be **attestable**:
|
||||
- Produce a signed attestation (DSSE/in-toto style) attached to the artifact digest
|
||||
- Output must be **replayable**:
|
||||
- Provide a “replay recipe” (analyzer versions, configs, vulnerability mapping version, and input digests)
|
||||
|
||||
### Non-functional requirements
|
||||
- Deterministic: repeated runs on same inputs produce identical evidence hash
|
||||
- Size-bounded: subgraph evidence must be bounded (e.g., path-based extraction + limited context)
|
||||
- Privacy-controllable:
|
||||
- Support a mode that avoids embedding raw source content (store pointers/hashes instead)
|
||||
- Verifiable offline:
|
||||
- Verification and replay must work air-gapped given the snapshot bundle
|
||||
|
||||
---
|
||||
|
||||
## 5) Acceptance criteria (use as Definition of Done)
|
||||
|
||||
A feature is “done” only when:
|
||||
|
||||
1. **Verifier can validate** the attestation signature and confirm the evidence hash matches content.
|
||||
2. A second machine can **reproduce the same evidence hash** given the replay bundle.
|
||||
3. Evidence includes at least one witness path for `REACHABLE`.
|
||||
4. Evidence includes explicit assumptions/gates; absence of gating is recorded as an assumption (e.g., “config unknown”).
|
||||
5. Evidence is **linked to the precise artifact digest** being deployed/scanned.
|
||||
|
||||
---
|
||||
|
||||
## 6) Product packaging decisions that create switching cost
|
||||
|
||||
These are product decisions that turn engineering into moat:
|
||||
|
||||
- **Make “reachability proof” an exportable object**, not just a UI view.
|
||||
- Provide an API: `GET /findings/{id}/proof` returning canonical evidence.
|
||||
- Support policy gates on:
|
||||
- `verdict`
|
||||
- `confidence`
|
||||
- `assumption_count`
|
||||
- `inconclusive_reasons`
|
||||
- Make “proof replay” a one-command workflow in CLI.
|
||||
|
||||
---
|
||||
|
||||
# Directions for Development Managers
|
||||
|
||||
## 1) Architecture: build a “proof pipeline” with strict boundaries
|
||||
|
||||
Implement as composable modules with stable interfaces:
|
||||
|
||||
1. **Artifact Resolver**
|
||||
- Inputs: repo URL/commit, build artifact path, container image digest
|
||||
- Output: normalized “artifact record” with digests and metadata
|
||||
|
||||
2. **Graph Builder (language-specific adapters)**
|
||||
- Inputs: artifact record
|
||||
- Output: canonical **Program Graph**
|
||||
- Nodes: code units
|
||||
- Edges: calls/dispatch
|
||||
- Optional: config gates, dependency edges
|
||||
|
||||
3. **Vulnerability-to-Code Mapper**
|
||||
- Inputs: vulnerability record (CVE), package coordinates, symbol metadata (if available)
|
||||
- Output: vulnerable node set + mapping confidence
|
||||
|
||||
4. **Entrypoint Modeler**
|
||||
- Inputs: artifact + runtime context (framework detection, routing tables, main methods)
|
||||
- Output: entrypoint node set with types (HTTP, RPC, CLI, cron)
|
||||
|
||||
5. **Reachability Engine**
|
||||
- Inputs: graph + entrypoints + vulnerable nodes + constraints
|
||||
- Output: witness paths + minimal subgraph extraction
|
||||
|
||||
6. **Evidence Canonicalizer**
|
||||
- Inputs: witness paths + subgraph + metadata
|
||||
- Output: canonical JSON (stable ordering, stable IDs), plus content hash
|
||||
|
||||
7. **Attestor**
|
||||
- Inputs: evidence hash + artifact digest
|
||||
- Output: signed attestation object (OCI attachable)
|
||||
|
||||
8. **Verifier (separate component)**
|
||||
- Must validate signatures + evidence integrity independently of generator
|
||||
|
||||
Critical: generator and verifier must be decoupled to preserve trust.
|
||||
|
||||
---
|
||||
|
||||
## 2) Evidence model: what to store (and how to keep it stable)
|
||||
|
||||
### Node identity must be stable across runs
|
||||
Define a canonical NodeID scheme:
|
||||
|
||||
- Source node ID:
|
||||
- `{language}:{repo_digest}:{symbol_signature}:{optional_source_location_hash}`
|
||||
- Post-build node ID:
|
||||
- `{language}:{artifact_digest}:{symbol_signature}:{optional_offset_or_token}`
|
||||
|
||||
Avoid raw file paths or non-deterministic compiler offsets as primary IDs unless normalized.
|
||||
|
||||
### Edge identity
|
||||
`{caller_node_id} -> {callee_node_id} : {edge_type}`
|
||||
Edge types matter (direct call, virtual dispatch, reflection, dynamic import, etc.)
|
||||
|
||||
### Subgraph extraction rule
|
||||
Store:
|
||||
- All nodes/edges on at least one witness path (or k witness paths)
|
||||
- Plus bounded context:
|
||||
- 1–2 hop neighborhood around the vulnerable node and entrypoint
|
||||
- routing edges (HTTP route → handler) where applicable
|
||||
|
||||
This makes the proof compact and audit-friendly.
|
||||
|
||||
### Canonicalization requirements
|
||||
- Stable sorting of nodes and edges
|
||||
- Canonical JSON serialization (no map-order nondeterminism)
|
||||
- Explicit analyzer version + config included in evidence
|
||||
- Hash everything that influences results
|
||||
|
||||
---
|
||||
|
||||
## 3) Determinism and reproducibility: engineering guardrails
|
||||
|
||||
### Deterministic computation
|
||||
- Avoid parallel graph traversal that yields nondeterministic order without canonical sorting
|
||||
- If using concurrency, collect results and sort deterministically before emitting
|
||||
|
||||
### Repro bundle (“time travel”)
|
||||
Persist, as digests:
|
||||
- Analyzer container/image digest
|
||||
- Analyzer config hash
|
||||
- Vulnerability mapping dataset version hash
|
||||
- Artifact digest(s)
|
||||
- Graph builder version hash
|
||||
|
||||
A replay must be possible without “calling home.”
|
||||
|
||||
### Golden tests
|
||||
Create fixtures where:
|
||||
- Same input graph + mapping → exact evidence hash
|
||||
- Regression test for canonicalization changes (version the schema intentionally)
|
||||
|
||||
---
|
||||
|
||||
## 4) Attestation format and verification
|
||||
|
||||
### Attestation contents (minimum)
|
||||
- Subject: artifact digest (image digest / build artifact digest)
|
||||
- Predicate: reachability evidence hash + metadata
|
||||
- Predicate type: `reachability` (custom) with versioning
|
||||
|
||||
### Verification requirements
|
||||
- Verification must run offline
|
||||
- It must validate:
|
||||
1) signature
|
||||
2) subject digest binding
|
||||
3) evidence hash matches serialized evidence
|
||||
|
||||
### Storage model
|
||||
Use content-addressable storage keyed by evidence hash.
|
||||
Attestation references the hash; evidence stored separately or embedded (size tradeoff).
|
||||
|
||||
---
|
||||
|
||||
## 5) Source + post-build support: engineering plan
|
||||
|
||||
### Unifying principle
|
||||
Both sources produce the same canonical Program Graph abstraction.
|
||||
|
||||
#### Source analyzers produce:
|
||||
- Function/method nodes using language signatures
|
||||
- Edges from static analysis IR
|
||||
|
||||
#### Post-build analyzers produce:
|
||||
- Nodes from bytecode/assembly symbol tables (where available)
|
||||
- Edges from bytecode call instructions / metadata
|
||||
|
||||
### Practical sequencing (recommended)
|
||||
1. Implement one source language adapter (fastest to prove model)
|
||||
2. Implement one post-build adapter where symbols are rich (e.g., Java bytecode)
|
||||
3. Ensure evidence schema and attestation workflow works identically for both
|
||||
4. Expand to more ecosystems once the proof pipeline is stable
|
||||
|
||||
---
|
||||
|
||||
## 6) Operational constraints (performance, size, security)
|
||||
|
||||
### Performance
|
||||
- Cache program graphs per artifact digest
|
||||
- Cache vulnerability-to-code mapping per package/version
|
||||
- Compute reachability on-demand per vulnerability, but reuse graphs
|
||||
|
||||
### Evidence size
|
||||
- Limit witness paths (e.g., up to N shortest paths)
|
||||
- Prefer “witness + bounded neighborhood” over exporting full call graph
|
||||
|
||||
### Security and privacy
|
||||
- Provide a “redacted proof mode”
|
||||
- include symbol hashes instead of raw names if needed
|
||||
- store source locations as hashes/pointers
|
||||
- Never embed raw source code unless explicitly enabled
|
||||
|
||||
---
|
||||
|
||||
## 7) Definition of Done for the engineering team
|
||||
|
||||
A milestone is complete when you can demonstrate:
|
||||
|
||||
1. Generate a reachability proof for a known vulnerable code unit with a witness path.
|
||||
2. Serialize a canonical evidence subgraph and compute a stable hash.
|
||||
3. Sign the attestation bound to the artifact digest.
|
||||
4. Verify the attestation on a clean machine (offline).
|
||||
5. Replay the analysis from the replay bundle and reproduce the same evidence hash.
|
||||
|
||||
---
|
||||
|
||||
# Concrete artifact example (for alignment)
|
||||
|
||||
A reachability evidence object should look structurally like:
|
||||
|
||||
- `subject`: artifact digest(s)
|
||||
- `claim`:
|
||||
- `verdict`: REACHABLE / NOT_PROVEN_REACHABLE / INCONCLUSIVE
|
||||
- `entrypoints`: list of NodeIDs
|
||||
- `vulnerable_nodes`: list of NodeIDs
|
||||
- `witness_paths`: list of paths (each path = ordered NodeIDs)
|
||||
- `subgraph`:
|
||||
- `nodes`: list with stable IDs + metadata
|
||||
- `edges`: list with stable ordering + edge types
|
||||
- `assumptions`:
|
||||
- gating conditions, unresolved dynamic dispatch notes, etc.
|
||||
- `tooling`:
|
||||
- analyzer name/version/digest
|
||||
- config hash
|
||||
- mapping dataset hash
|
||||
- `hashes`:
|
||||
- evidence content hash
|
||||
- schema version
|
||||
|
||||
Then wrap and sign it as an attestation tied to the artifact digest.
|
||||
|
||||
---
|
||||
|
||||
## The one decision you should force early
|
||||
|
||||
Decide (and document) whether your semantics are:
|
||||
|
||||
- **Witness-based** (“REACHABLE only if we can produce a witness path”), and
|
||||
- **Conservative on negative claims** (“NOT_PROVEN_REACHABLE” is not “unreachable”).
|
||||
|
||||
This single decision will keep the system honest, reduce legal/audit risk, and prevent the product from drifting into hand-wavy “trust us” scoring.
|
||||
Reference in New Issue
Block a user