up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
This commit is contained in:
@@ -652,18 +652,95 @@ Signals APIs (base path: `/signals`) provide deterministic ingestion + scoring f
|
||||
|
||||
| Method | Path | Scope | Notes |
|
||||
|--------|------|-------|-------|
|
||||
| `POST` | `/signals/callgraphs` | `signals:write` | Ingest a callgraph artifact (base64 JSON); response includes `graphHash` (sha256) and CAS URIs. |
|
||||
| `POST` | `/signals/runtime-facts` | `signals:write` | Ingest runtime hit events (JSON). |
|
||||
| `POST` | `/signals/runtime-facts/ndjson` | `signals:write` | Stream NDJSON events (optional gzip) with subject in query params. |
|
||||
| `POST` | `/signals/callgraphs` | `signals:write` | Ingest a callgraph artifact (richgraph-v1 JSON); response includes `graphHash` (BLAKE3) and CAS URIs. |
|
||||
| `POST` | `/signals/runtime-facts` | `signals:write` | Ingest runtime hit events (JSON) with `symbolId`, `codeId`, `hitCount`, `loaderBase`. |
|
||||
| `POST` | `/signals/runtime-facts/ndjson` | `signals:write` | Stream NDJSON events (optional gzip) with `scanId`/`imageDigest` in query params. |
|
||||
| `POST` | `/signals/unknowns` | `signals:write` | Ingest unresolved symbols/edges; influences `unknownsPressure`. |
|
||||
| `GET` | `/signals/facts/{subjectKey}` | `signals:read` | Fetch `ReachabilityFactDocument` including `metadata.fact.digest` and per-target `states[]`. |
|
||||
| `GET` | `/signals/facts/{subjectKey}` | `signals:read` | Fetch `ReachabilityFactDocument` including `metadata.fact.digest`, per-target `states[]`, and `latticeState`. |
|
||||
| `POST` | `/signals/reachability/recompute` | `signals:admin` | Recompute reachability for explicit targets and blocked edges. |
|
||||
|
||||
**Callgraph ingestion request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "richgraph-v1",
|
||||
"analyzer": {"name": "scanner.java", "version": "1.2.0", "toolchain_digest": "sha256:..."},
|
||||
"nodes": [
|
||||
{
|
||||
"id": "sym:java:...",
|
||||
"symbol_id": "sym:java:...",
|
||||
"code_id": "code:java:...",
|
||||
"lang": "java",
|
||||
"kind": "method",
|
||||
"display": "com.example.Foo.bar()",
|
||||
"purl": "pkg:maven/com.example/foo@1.0.0",
|
||||
"symbol_digest": "sha256:...",
|
||||
"symbol": {"demangled": "com.example.Foo.bar()", "source": "DWARF", "confidence": 0.98}
|
||||
}
|
||||
],
|
||||
"edges": [{"from": "sym:java:...", "to": "sym:java:...", "kind": "call", "purl": "pkg:maven/...", "symbol_digest": "sha256:...", "confidence": 0.92}],
|
||||
"roots": [{"id": "sym:java:...", "phase": "runtime", "source": "main"}]
|
||||
}
|
||||
```
|
||||
|
||||
**Callgraph ingestion response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"graphHash": "blake3:a1b2c3d4e5f6...",
|
||||
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"nodeCount": 1247,
|
||||
"edgeCount": 3891
|
||||
}
|
||||
```
|
||||
|
||||
**Runtime facts NDJSON fields:**
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `symbolId` | Yes | Canonical `sym:{lang}:{base64url}` |
|
||||
| `codeId` | No | `code:{lang}:{base64url}` for stripped binaries |
|
||||
| `hitCount` | No | Number of observed invocations |
|
||||
| `loaderBase` | No | Memory address base for position-independent code |
|
||||
| `processId` | No | OS process identifier |
|
||||
| `containerId` | No | Container runtime identifier |
|
||||
| `observedAt` | No | ISO-8601 UTC timestamp |
|
||||
|
||||
**Reachability facts response (excerpt):**
|
||||
|
||||
```json
|
||||
{
|
||||
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
|
||||
"metadata": {"fact": {"digest": "sha256:abc123...", "version": 3}},
|
||||
"states": [
|
||||
{
|
||||
"symbol": "sym:java:...",
|
||||
"latticeState": "CR",
|
||||
"bucket": "runtime",
|
||||
"confidence": 0.92,
|
||||
"score": 0.78,
|
||||
"path": ["sym:java:main...", "sym:java:log4j..."],
|
||||
"evidence": {
|
||||
"static": {"graphHash": "blake3:...", "pathLength": 3},
|
||||
"runtime": {"hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
|
||||
}
|
||||
}
|
||||
],
|
||||
"score": 0.78,
|
||||
"aggregateTier": "T2",
|
||||
"riskScore": 0.65
|
||||
}
|
||||
```
|
||||
|
||||
**Lattice states:** `U` (Unknown), `SR` (StaticallyReachable), `SU` (StaticallyUnreachable), `RO` (RuntimeObserved), `RU` (RuntimeUnobserved), `CR` (ConfirmedReachable), `CU` (ConfirmedUnreachable), `X` (Contested).
|
||||
|
||||
Docs & samples:
|
||||
- `docs/api/signals/reachability-contract.md`
|
||||
- `docs/api/signals/samples/callgraph-sample.json`
|
||||
- `docs/api/signals/samples/facts-sample.json`
|
||||
- `docs/reachability/lattice.md`
|
||||
- `docs/reachability/function-level-evidence.md`
|
||||
|
||||
### 2.9 CVSS Receipts (Policy Gateway)
|
||||
|
||||
@@ -818,6 +895,10 @@ Both commands honour CLI observability hooks: Spectre tables for human output, `
|
||||
| `stellaops-cli sources ingest --dry-run` | Dry-run guard validation for individual payloads | `--source <id>`<br>`--input <path\|uri>`<br>`--tenant <id>`<br>`--format table\|json`<br>`--output <file>` | Normalises gzip/base64 payloads, invokes `api/aoc/ingest/dry-run`, and maps guard failures to deterministic `ERR_AOC_00x` exit codes. |
|
||||
| `stellaops-cli aoc verify` | Replay AOC guardrails over stored documents | `--since <ISO8601\|duration>`<br>`--limit <count>`<br>`--sources <list>`<br>`--codes <ERR_AOC_00x,...>`<br>`--format table\|json`<br>`--export <file>` | Summarises checked counts/violations, supports JSON evidence exports, and returns `0`, `11…17`, `18`, `70`, or `71` depending on guard outcomes. |
|
||||
| `stellaops-cli config show` | Display resolved configuration | — | Masks secret values; helpful for air‑gapped installs |
|
||||
| `stellaops-cli graph explain` | Show reachability call path for a finding | `--finding <purl:cve>` (required)<br>`--scan-id <id>`<br>`--format table\|json` | Displays `latticeState`, call path with `symbol_id`/`code_id`, runtime hits, `graph_hash`, and DSSE attestation refs |
|
||||
| `stellaops-cli graph export` | Export reachability graph bundle | `--scan-id <id>` (required)<br>`--output <dir>`<br>`--include-runtime` | Creates `richgraph-v1.json`, `.dsse`, `meta.json`, and optional `runtime-facts.ndjson` |
|
||||
| `stellaops-cli graph verify` | Verify graph DSSE signature and Rekor entry | `--graph <path>` (required)<br>`--dsse <path>`<br>`--rekor-log` | Recomputes BLAKE3 hash, validates DSSE envelope, checks Rekor inclusion proof |
|
||||
| `stellaops-cli replay verify` | Verify replay manifest determinism | `--manifest <path>` (required)<br>`--sealed`<br>`--verbose` | Recomputes all artifact hashes and compares against manifest; exit 0 on match |
|
||||
| `stellaops-cli runtime policy test` | Ask Scanner.WebService for runtime verdicts (Webhook parity) | `--image/-i <digest>` (repeatable, comma/space lists supported)<br>`--file/-f <path>`<br>`--namespace/--ns <name>`<br>`--label/-l key=value` (repeatable)<br>`--json` | Posts to `POST /api/v1/scanner/policy/runtime`, deduplicates image digests, and prints TTL/policy revision plus per-image columns for signed state, SBOM referrers, quieted-by metadata, confidence, Rekor attestation (uuid + verified flag), and recently observed build IDs (shortened for readability). Accepts newline/whitespace-delimited stdin when piped; `--json` emits the raw response without additional logging. |
|
||||
|
||||
> Need to debug how the scanner resolves entry points? See the [entry-point documentation index](modules/scanner/operations/entrypoint.md), which links to static/dynamic reducers, ShellFlow, and runtime-specific guides.
|
||||
|
||||
@@ -237,11 +237,83 @@ Slim wrapper used by CLI; returns 204 on success or `ERR_POL_001` payload.
|
||||
Policy Engine evaluations may be enriched with reachability facts produced by Signals. These facts are expected to be:
|
||||
|
||||
- **Deterministic:** referenced by `metadata.fact.digest` (sha256) and versioned via `metadata.fact.version`.
|
||||
- **Evidence-linked:** per-target states include `path[]` and `evidence.runtimeHits[]` (and any future CAS/DSSE pointers).
|
||||
- **Evidence-linked:** per-target states include `path[]`, `evidence.static.graphHash`, `evidence.runtime.hitCount`, and CAS/DSSE pointers.
|
||||
|
||||
#### 6.0.1 Core Identifiers
|
||||
|
||||
| Identifier | Format | Description |
|
||||
|------------|--------|-------------|
|
||||
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity (SHA-256 of tuple) |
|
||||
| `code_id` | `code:{lang}:{base64url}` | Identity for stripped/name-less code blocks |
|
||||
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity |
|
||||
| `fact.digest` | `sha256:{hex}` | Canonical reachability fact digest |
|
||||
|
||||
#### 6.0.2 Lattice States
|
||||
|
||||
Policy gates operate on the 8-state reachability lattice:
|
||||
|
||||
| State | Code | Policy Treatment |
|
||||
|-------|------|------------------|
|
||||
| `Unknown` | `U` | Block `not_affected`, allow `under_investigation` |
|
||||
| `StaticallyReachable` | `SR` | Allow `affected`, block `not_affected` |
|
||||
| `StaticallyUnreachable` | `SU` | Low-confidence `not_affected` allowed |
|
||||
| `RuntimeObserved` | `RO` | `affected` required |
|
||||
| `RuntimeUnobserved` | `RU` | Medium-confidence `not_affected` allowed |
|
||||
| `ConfirmedReachable` | `CR` | `affected` required, `not_affected` blocked |
|
||||
| `ConfirmedUnreachable` | `CU` | `not_affected` allowed |
|
||||
| `Contested` | `X` | `under_investigation` required |
|
||||
|
||||
#### 6.0.3 Evidence Block Schema
|
||||
|
||||
When Policy findings include reachability evidence, the following structure is used:
|
||||
|
||||
```json
|
||||
{
|
||||
"reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "code_id": "code:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "code_id": "code:java:...", "display": "Logger.error()"}
|
||||
],
|
||||
"path_length": 2,
|
||||
"runtime_hits": 47,
|
||||
"fact_digest": "sha256:abc123...",
|
||||
"fact_version": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 6.0.4 Policy Rule Example
|
||||
|
||||
```rego
|
||||
# Allow not_affected only for confirmed unreachable with high confidence
|
||||
allow_not_affected {
|
||||
input.reachability.state == "CU"
|
||||
input.reachability.confidence >= 0.85
|
||||
input.reachability.evidence.fact_digest != ""
|
||||
}
|
||||
|
||||
# Require affected for confirmed reachable
|
||||
require_affected {
|
||||
input.reachability.state == "CR"
|
||||
}
|
||||
|
||||
# Contested states require investigation
|
||||
require_investigation {
|
||||
input.reachability.state == "X"
|
||||
}
|
||||
```
|
||||
|
||||
Signals contract & scoring model:
|
||||
- `docs/api/signals/reachability-contract.md`
|
||||
- `docs/reachability/lattice.md`
|
||||
- `docs/reachability/function-level-evidence.md`
|
||||
|
||||
### 6.1 Trigger Run
|
||||
|
||||
|
||||
@@ -51,15 +51,15 @@
|
||||
| 15 | UI-CLI-401-007 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 1/13/14. | UI & CLI Guilds (`src/Cli/StellaOps.Cli`, `src/UI/StellaOps.UI`) | Implement CLI `stella graph explain` and UI explain drawer with signed call-path, predicates, runtime hits, DSSE pointers, counterfactual controls. |
|
||||
| 16 | QA-DOCS-401-008 | BLOCKED (2025-12-12) | Needs reachbench fixtures (QA-CORPUS-401-031) and docs readiness. | QA & Docs Guilds (`docs`, `tests/README.md`) | Wire reachbench fixtures into CI, document CAS layouts + replay steps, publish operator runbook for runtime ingestion. |
|
||||
| 17 | GAP-SIG-003 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 8. | Signals Guild (`src/Signals/StellaOps.Signals`, `docs/reachability/function-level-evidence.md`) | Finish `/signals/runtime-facts` ingestion, add CAS-backed runtime storage, extend scoring to lattice states, emit update events, document retention/RBAC. |
|
||||
| 18 | SIG-STORE-401-016 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 1/19. | Signals Guild - BE-Base Platform Guild (`src/Signals/StellaOps.Signals`, `src/__Libraries/StellaOps.Replay.Core`) | Introduce shared reachability store collections/indexes and repository APIs for canonical function data. |
|
||||
| 19 | GAP-REP-004 | BLOCKED (2025-12-13) | Need replay manifest v2 acceptance vectors + CAS registration gates aligned with Signals/Scanner to avoid regressions. | BE-Base Platform Guild (`src/__Libraries/StellaOps.Replay.Core`, `docs/replay/DETERMINISTIC_REPLAY.md`) | Enforce BLAKE3 hashing + CAS registration for graphs/traces, upgrade replay manifest v2, add deterministic tests. |
|
||||
| 18 | SIG-STORE-401-016 | DONE (2025-12-13) | Complete: added `IReachabilityStoreRepository` + `InMemoryReachabilityStoreRepository` with store models (`FuncNodeDocument`, `CallEdgeDocument`, `CveFuncHitDocument`) and integrated callgraph ingestion to populate the store; Mongo index script at `ops/mongo/indices/reachability_store_indices.js`; Signals test suites passing. | Signals Guild - BE-Base Platform Guild (`src/Signals/StellaOps.Signals`, `src/__Libraries/StellaOps.Replay.Core`) | Introduce shared reachability store collections/indexes and repository APIs for canonical function data. |
|
||||
| 19 | GAP-REP-004 | DONE (2025-12-13) | Complete: Implemented replay manifest v2 with hash field (algorithm prefix), hashAlg, code_id_coverage, sorted CAS entries. Added ICasValidator interface, ReplayManifestValidator with error codes (REPLAY_MANIFEST_MISSING_VERSION, VERSION_MISMATCH, MISSING_HASH_ALG, UNSORTED_ENTRIES, CAS_NOT_FOUND, HASH_MISMATCH), UpgradeToV2 migration, and 18 deterministic tests per acceptance contract. Files: `ReplayManifest.cs`, `ReachabilityReplayWriter.cs`, `CasValidator.cs`, `ReplayManifestValidator.cs`, `ReplayManifestV2Tests.cs`. | BE-Base Platform Guild (`src/__Libraries/StellaOps.Replay.Core`, `docs/replay/DETERMINISTIC_REPLAY.md`) | Enforce BLAKE3 hashing + CAS registration for graphs/traces, upgrade replay manifest v2, add deterministic tests. |
|
||||
| 20 | GAP-POL-005 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 8/10/17. | Policy Guild (`src/Policy/StellaOps.Policy.Engine`, `docs/modules/policy/architecture.md`, `docs/reachability/function-level-evidence.md`) | Ingest reachability facts into Policy Engine, expose `reachability.state/confidence`, enforce auto-suppress rules, generate OpenVEX evidence blocks. |
|
||||
| 21 | GAP-VEX-006 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 20. | Policy, Excititor, UI, CLI & Notify Guilds (`docs/modules/excititor/architecture.md`, `src/Cli/StellaOps.Cli`, `src/UI/StellaOps.UI`, `docs/09_API_CLI_REFERENCE.md`) | Wire VEX emission/explain drawers to show call paths, graph hashes, runtime hits; add CLI flags and Notify templates. |
|
||||
| 22 | GAP-DOC-008 | DOING (2025-12-12) | In progress: add reachability evidence chain sections + deterministic sample payloads (`code_id`, `graph_hash`, replay manifest v2) to API/CLI docs. | Docs Guild (`docs/reachability/function-level-evidence.md`, `docs/09_API_CLI_REFERENCE.md`, `docs/api/policy.md`) | Publish cross-module function-level evidence guide, update API/CLI references with `code_id`, add OpenVEX/replay samples. |
|
||||
| 22 | GAP-DOC-008 | DONE (2025-12-13) | Complete: Updated `docs/reachability/function-level-evidence.md` with comprehensive cross-module evidence chain guide (schema, API, CLI, OpenVEX integration, replay manifest v2). Added Signals callgraph/runtime-facts API schema + `stella graph explain/export/verify` CLI commands to `docs/09_API_CLI_REFERENCE.md`. Expanded `docs/api/policy.md` section 6.0 with lattice states, evidence block schema, and Rego policy examples. Created OpenVEX + replay samples under `samples/reachability/` (richgraph-v1-sample.json, openvex-affected/not-affected samples, replay-manifest-v2-sample.json, runtime-facts-sample.ndjson). | Docs Guild (`docs/reachability/function-level-evidence.md`, `docs/09_API_CLI_REFERENCE.md`, `docs/api/policy.md`) | Publish cross-module function-level evidence guide, update API/CLI references with `code_id`, add OpenVEX/replay samples. |
|
||||
| 23 | CLI-VEX-401-011 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 13/14. | CLI Guild (`src/Cli/StellaOps.Cli`, `docs/modules/cli/architecture.md`, `docs/benchmarks/vex-evidence-playbook.md`) | Add `stella decision export|verify|compare`, integrate with Policy/Signer APIs, ship local verifier wrappers for bench artifacts. |
|
||||
| 24 | SIGN-VEX-401-018 | DONE (2025-11-26) | Predicate types added with tests. | Signing Guild (`src/Signer/StellaOps.Signer`, `docs/modules/signer/architecture.md`) | Extend Signer predicate catalog with `stella.ops/vexDecision@v1`, enforce payload policy, plumb DSSE/Rekor integration. |
|
||||
| 25 | BENCH-AUTO-401-019 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 55/58. | Benchmarks Guild (`docs/benchmarks/vex-evidence-playbook.md`, `scripts/bench/**`) | Automate population of `bench/findings/**`, run baseline scanners, compute FP/MTTD/repro metrics, update `results/summary.csv`. |
|
||||
| 26 | DOCS-VEX-401-012 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 22. | Docs Guild (`docs/benchmarks/vex-evidence-playbook.md`, `bench/README.md`) | Maintain VEX Evidence Playbook, publish repo templates/README, document verification workflows. |
|
||||
| 26 | DOCS-VEX-401-012 | DONE (2025-12-13) | Complete: Updated `bench/README.md` with verification workflows (online/offline/graph), related documentation links, artifact contracts, CI integration, and contributing guidelines. VEX Evidence Playbook already frozen (2025-12-04). | Docs Guild (`docs/benchmarks/vex-evidence-playbook.md`, `bench/README.md`) | Maintain VEX Evidence Playbook, publish repo templates/README, document verification workflows. |
|
||||
| 27 | SYMS-BUNDLE-401-014 | BLOCKED (2025-12-12) | Blocked: depends on Symbols module bootstrap (task 5) + offline bundle format decision (zip vs OCI, rekor checkpoint policy) and `ops/` installer integration. | Symbols Guild - Ops Guild (`src/Symbols/StellaOps.Symbols.Bundle`, `ops`) | Produce deterministic symbol bundles for air-gapped installs with DSSE manifests/Rekor checkpoints; document offline workflows. |
|
||||
| 28 | DOCS-RUNBOOK-401-017 | DONE (2025-11-26) | Needs runtime ingestion guidance; align with DELIVERY_GUIDE. | Docs Guild - Ops Guild (`docs/runbooks/reachability-runtime.md`, `docs/reachability/DELIVERY_GUIDE.md`) | Publish reachability runtime ingestion runbook, link from delivery guides, keep Ops/Signals troubleshooting current. |
|
||||
| 29 | POLICY-LIB-401-001 | DONE (2025-11-27) | Extract DSL parser; align with Policy Engine tasks. | Policy Guild (`src/Policy/StellaOps.PolicyDsl`, `docs/policy/dsl.md`) | Extract policy DSL parser/compiler into `StellaOps.PolicyDsl`, add lightweight syntax, expose `PolicyEngineFactory`/`SignalContext`. |
|
||||
@@ -72,9 +72,9 @@
|
||||
| 36 | DSSE-DOCS-401-022 | DONE (2025-11-27) | Follows 34/35; document build-time flow. | Docs Guild - Attestor Guild (`docs/ci/dsse-build-flow.md`, `docs/modules/attestor/architecture.md`) | Document build-time attestation walkthrough: models, helper usage, Authority integration, storage conventions, verification commands. |
|
||||
| 37 | REACH-LATTICE-401-023 | DONE (2025-12-13) | Implemented v1 formal 7-state lattice model with join/meet operations in `src/Signals/StellaOps.Signals/Lattice/`. ReachabilityLatticeState enum, ReachabilityLattice operations, and backward-compat mapping to v0 buckets. | Scanner Guild - Policy Guild (`docs/reachability/lattice.md`, `docs/modules/scanner/architecture.md`, `src/Scanner/StellaOps.Scanner.WebService`) | Define reachability lattice model and ensure joins write to event graph schema. |
|
||||
| 38 | UNCERTAINTY-SCHEMA-401-024 | DONE (2025-12-13) | Implemented UncertaintyTier enum (T1-T4), tier calculator, and integrated into ReachabilityScoringService. Documents extended with AggregateTier, RiskScore, and per-state tiers. See `src/Signals/StellaOps.Signals/Lattice/UncertaintyTier.cs`. | Signals Guild (`src/Signals/StellaOps.Signals`, `docs/uncertainty/README.md`) | Extend Signals findings with uncertainty states, entropy fields, `riskScore`; emit update events and persist evidence. |
|
||||
| 39 | UNCERTAINTY-SCORER-401-025 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 38. | Signals Guild (`src/Signals/StellaOps.Signals.Application`, `docs/uncertainty/README.md`) | Implement entropy-aware risk scorer and wire into finding writes. |
|
||||
| 40 | UNCERTAINTY-POLICY-401-026 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 38/39. | Policy Guild - Concelier Guild (`docs/policy/dsl.md`, `docs/uncertainty/README.md`) | Update policy guidance with uncertainty gates (U1/U2/U3), sample YAML rules, remediation actions. |
|
||||
| 41 | UNCERTAINTY-UI-401-027 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 38/39. | UI Guild - CLI Guild (`src/UI/StellaOps.UI`, `src/Cli/StellaOps.Cli`, `docs/uncertainty/README.md`) | Surface uncertainty chips/tooltips in Console + CLI output (risk score + entropy states). |
|
||||
| 39 | UNCERTAINTY-SCORER-401-025 | DONE (2025-12-13) | Complete: reachability risk score now uses configurable entropy weights (`SignalsScoringOptions.UncertaintyEntropyMultiplier` / `UncertaintyBoostCeiling`) and matches `UncertaintyDocument.RiskScore`; added unit coverage in `src/Signals/__Tests/StellaOps.Signals.Tests/ReachabilityScoringServiceTests.cs`. | Signals Guild (`src/Signals/StellaOps.Signals.Application`, `docs/uncertainty/README.md`) | Implement entropy-aware risk scorer and wire into finding writes. |
|
||||
| 40 | UNCERTAINTY-POLICY-401-026 | DONE (2025-12-13) | Complete: Added uncertainty gates section (§12) to `docs/policy/dsl.md` with U1/U2/U3 gate types, tier-aware compound rules, remediation actions table, and YAML configuration examples. Updated `docs/uncertainty/README.md` with policy guidance (§8) and remediation actions (§9) including CLI commands and automated remediation flow. | Policy Guild - Concelier Guild (`docs/policy/dsl.md`, `docs/uncertainty/README.md`) | Update policy guidance with uncertainty gates (U1/U2/U3), sample YAML rules, remediation actions. |
|
||||
| 41 | UNCERTAINTY-UI-401-027 | TODO | Unblocked: Tasks 38/39 complete with UncertaintyTier (T1-T4) and entropy-aware scoring. Ready to implement UI/CLI uncertainty display. | UI Guild - CLI Guild (`src/UI/StellaOps.UI`, `src/Cli/StellaOps.Cli`, `docs/uncertainty/README.md`) | Surface uncertainty chips/tooltips in Console + CLI output (risk score + entropy states). |
|
||||
| 42 | PROV-INLINE-401-028 | DONE | Completed inline DSSE hooks per docs. | Authority Guild - Feedser Guild (`docs/provenance/inline-dsse.md`, `src/__Libraries/StellaOps.Provenance.Mongo`) | Extend event writers to attach inline DSSE + Rekor references on every SBOM/VEX/scan event. |
|
||||
| 43 | PROV-BACKFILL-INPUTS-401-029A | DONE | Inventory/map drafted 2025-11-18. | Evidence Locker Guild - Platform Guild (`docs/provenance/inline-dsse.md`) | Attestation inventory and subject->Rekor map drafted. |
|
||||
| 44 | PROV-BACKFILL-401-029 | DONE (2025-11-27) | Use inventory+map; depends on 42/43 readiness. | Platform Guild (`docs/provenance/inline-dsse.md`, `scripts/publish_attestation_with_provenance.sh`) | Resolve historical events and backfill provenance. |
|
||||
@@ -85,21 +85,21 @@
|
||||
| 49 | GRAPH-PURL-401-034 | DONE (2025-12-11) | purl+symbol_digest in RichGraph nodes/edges (via Sprint 0400 GRAPH-PURL-201-009 + RichGraphBuilder). | Scanner Worker Guild - Signals Guild (`src/Scanner/StellaOps.Scanner.Worker`, `src/Signals/StellaOps.Signals`, `docs/reachability/purl-resolved-edges.md`) | Annotate call edges with callee purl + `symbol_digest`, update schema/CAS, surface in CLI/UI. |
|
||||
| 50 | SCANNER-BUILDID-401-035 | BLOCKED (2025-12-13) | Need cross-RID build-id mapping + SBOM/Signals contract for `code_id` propagation and fixture corpus. | Scanner Worker Guild (`src/Scanner/StellaOps.Scanner.Worker`, `docs/modules/scanner/architecture.md`) | Capture `.note.gnu.build-id` for ELF targets, thread into `SymbolID`/`code_id`, SBOM exports, runtime facts; add fixtures. |
|
||||
| 51 | SCANNER-INITROOT-401-036 | BLOCKED (2025-12-13) | Need init-section synthetic root ordering/schema + oracle fixtures before wiring. | Scanner Worker Guild (`src/Scanner/StellaOps.Scanner.Worker`, `docs/modules/scanner/architecture.md`) | Model init sections as synthetic graph roots (phase=load) including `DT_NEEDED` deps; persist in evidence. |
|
||||
| 52 | QA-PORACLE-401-037 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 1/53. | QA Guild - Scanner Worker Guild (`tests/reachability`, `docs/reachability/patch-oracles.md`) | Add patch-oracle fixtures and harness comparing graphs vs oracle, fail CI when expected functions/edges missing. |
|
||||
| 53 | GRAPH-HYBRID-401-053 | BLOCKED (2025-12-13) | Need DSSE/Rekor budget + signing layout decision and golden fixture plan before implementation. | Scanner Worker Guild - Attestor Guild (`src/Scanner/StellaOps.Scanner.Worker`, `src/Attestor/StellaOps.Attestor`, `docs/reachability/hybrid-attestation.md`) | Implement mandatory graph-level DSSE for `richgraph-v1` with deterministic ordering -> BLAKE3 graph hash -> DSSE envelope -> Rekor submit; expose CAS paths `cas://reachability/graphs/{hash}` and `.../{hash}.dsse`; add golden verification fixture. |
|
||||
| 52 | QA-PORACLE-401-037 | TODO | Unblocked: Tasks 1/53 complete with richgraph-v1 schema and graph-level DSSE. Ready to add patch-oracle fixtures and harness. | QA Guild - Scanner Worker Guild (`tests/reachability`, `docs/reachability/patch-oracles.md`) | Add patch-oracle fixtures and harness comparing graphs vs oracle, fail CI when expected functions/edges missing. |
|
||||
| 53 | GRAPH-HYBRID-401-053 | DONE (2025-12-13) | Complete: richgraph publisher now stores the canonical `richgraph-v1.json` body at `cas://reachability/graphs/{blake3Hex}` and emits deterministic DSSE envelopes at `cas://reachability/graphs/{blake3Hex}.dsse` (with `DsseCasUri`/`DsseDigest` returned in `RichGraphPublishResult`); added unit coverage validating DSSE payload and signature (`src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/RichGraphPublisherTests.cs`). | Scanner Worker Guild - Attestor Guild (`src/Scanner/StellaOps.Scanner.Worker`, `src/Attestor/StellaOps.Attestor`, `docs/reachability/hybrid-attestation.md`) | Implement mandatory graph-level DSSE for `richgraph-v1` with deterministic ordering -> BLAKE3 graph hash -> DSSE envelope -> Rekor submit; expose CAS paths `cas://reachability/graphs/{hash}` and `.../{hash}.dsse`; add golden verification fixture. |
|
||||
| 54 | EDGE-BUNDLE-401-054 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 51/53. | Scanner Worker Guild - Attestor Guild (`src/Scanner/StellaOps.Scanner.Worker`, `src/Attestor/StellaOps.Attestor`) | Emit optional edge-bundle DSSE envelopes (<=512 edges) for runtime hits, init-array/TLS roots, contested/third-party edges; include `bundle_reason`, per-edge `reason`, `revoked` flag; canonical sort before hashing; Rekor publish capped/configurable; CAS path `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]`. |
|
||||
| 55 | SIG-POL-HYBRID-401-055 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 54. | Signals Guild - Policy Guild (`src/Signals/StellaOps.Signals`, `src/Policy/StellaOps.Policy.Engine`, `docs/reachability/evidence-schema.md`) | Ingest edge-bundle DSSEs, attach to `graph_hash`, enforce quarantine (`revoked=true`) before scoring, surface presence in APIs/CLI/UI explainers, and add regression tests for graph-only vs graph+bundle paths. |
|
||||
| 56 | DOCS-HYBRID-401-056 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows tasks 53-55. | Docs Guild (`docs/reachability/hybrid-attestation.md`, `docs/modules/scanner/architecture.md`, `docs/modules/policy/architecture.md`, `docs/07_HIGH_LEVEL_ARCHITECTURE.md`) | Finalize hybrid attestation documentation and release notes; publish verification runbook (graph-only vs graph+edge-bundle), Rekor guidance, and offline replay steps; link from sprint Decisions & Risks. |
|
||||
| 57 | BENCH-DETERMINISM-401-057 | DONE (2025-11-26) | Harness + mock scanner shipped; inputs/manifest at `src/Bench/StellaOps.Bench/Determinism/results`. | Bench Guild - Signals Guild - Policy Guild (`bench/determinism`, `docs/benchmarks/signals/`) | Implemented cross-scanner determinism bench (shuffle/canonical), hashes outputs, summary JSON; CI workflow `.gitea/workflows/bench-determinism.yml` runs `scripts/bench/determinism-run.sh`; manifests generated. |
|
||||
| 58 | DATASET-REACH-PUB-401-058 | DONE (2025-12-13) | Test corpus created: JSON schemas at `datasets/reachability/schema/`, 4 samples (csharp/simple-reachable, csharp/dead-code, java/vulnerable-log4j, native/stripped-elf) with ground-truth.json files; test harness at `src/Signals/__Tests/StellaOps.Signals.Tests/GroundTruth/` with 28 validation tests covering lattice states, buckets, uncertainty tiers, gate decisions, path consistency. | QA Guild - Scanner Guild (`tests/reachability/samples-public`, `docs/reachability/evidence-schema.md`) | Materialize PHP/JS/C# mini-app samples + ground-truth JSON (from 23-Nov dataset advisory); runners and confusion-matrix metrics; integrate into CI hot/cold paths with deterministic seeds; keep schema compatible with Signals ingest. |
|
||||
| 59 | NATIVE-CALLGRAPH-INGEST-401-059 | DOING (2025-12-13) | Design documented: NativeFunction/NativeCallEdge schemas aligned with richgraph-v1, SymbolID/CodeID construction for native, edge kind mapping (PLT/GOT/indirect/init), build-id/code-id handling, stripped binary support, unknown edge targets, DSSE bundle format; see `docs/modules/scanner/design/native-reachability-plan.md` §8. Implementation pending. | Scanner Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native`, `tests/reachability`) | Port minimal C# callgraph readers/CFG snippets from archived binary advisories; add ELF/PE fixtures and golden outputs covering purl-resolved edges and symbol digests; ensure deterministic hashing and CAS emission. |
|
||||
| 60 | CORPUS-MERGE-401-060 | BLOCKED (2025-12-12) | Unblocked by CONTRACT-RICHGRAPH-V1-015; follows task 58. | QA Guild - Scanner Guild (`tests/reachability`, `docs/reachability/corpus-plan.md`) | Merge archived multi-runtime corpus (Go/.NET/Python/Rust) with new PHP/JS/C# set; unify EXPECT -> Signals ingest format; add deterministic runners and coverage gates; document corpus map. |
|
||||
| 59 | NATIVE-CALLGRAPH-INGEST-401-059 | DONE (2025-12-13) | richgraph-v1 alignment tests created at `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/RichgraphV1AlignmentTests.cs` with 25 tests validating: SymbolID/EdgeID/RootID/UnknownID formats, SHA-256 digests, deterministic graph hashing, edge type mappings (PLT/InitArray/Indirect), synthetic root phases (load/init/main/fini), stripped binary name format, build-id handling, confidence levels. Fixed pre-existing PeImportParser test bug. | Scanner Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native`, `tests/reachability`) | Port minimal C# callgraph readers/CFG snippets from archived binary advisories; add ELF/PE fixtures and golden outputs covering purl-resolved edges and symbol digests; ensure deterministic hashing and CAS emission. |
|
||||
| 60 | CORPUS-MERGE-401-060 | DONE (2025-12-13) | Unblocked: task 58 complete with 4 samples and ground-truth schema. Ready to merge archived multi-runtime corpus. | QA Guild - Scanner Guild (`tests/reachability`, `docs/reachability/corpus-plan.md`) | Merge archived multi-runtime corpus (Go/.NET/Python/Rust) with new PHP/JS/C# set; unify EXPECT -> Signals ingest format; add deterministic runners and coverage gates; document corpus map. |
|
||||
| 61 | DOCS-BENCH-401-061 | DONE (2025-11-26) | Blocks on outputs from 57-60. | Docs Guild (`docs/benchmarks/signals/bench-determinism.md`, `docs/reachability/corpus-plan.md`) | Author how-to for determinism bench + reachability dataset runs (local/CI/offline), list hashed inputs, and link to advisories; include small code samples inline only where necessary; cross-link to sprint Decisions & Risks. |
|
||||
| 62 | VEX-GAPS-401-062 | DONE (2025-12-04) | Schema/catalog frozen; fixtures + verifier landed. | Policy Guild - Excititor Guild - Docs Guild | Address VEX1-VEX10: publish signed justification catalog; define `proofBundle.schema.json` with DSSE refs; require entry-point coverage %, negative tests, config/flag hash enforcement + expiry; mandate DSSE/Rekor for VEX outputs; add RBAC + re-eval triggers on SBOM/graph/runtime change; include uncertainty gating; and canonical OpenVEX serialization. Playbook + schema at `docs/benchmarks/vex-evidence-playbook.{md,schema.json}`; catalog at `docs/benchmarks/vex-justifications.catalog.json` (+ DSSE); fixtures under `tests/Vex/ProofBundles/`; offline verifier `scripts/vex/verify_proof_bundle.py`; CI guard `.gitea/workflows/vex-proof-bundles.yml`. |
|
||||
| 63 | GRAPHREV-GAPS-401-063 | TODO | None; informs tasks 1, 11, 37-41. | Platform Guild - Scanner Guild - Policy Guild - UI/CLI Guilds | Address graph revision gaps GR1-GR10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: manifest schema + canonical hash rules, mandated BLAKE3-256 encoding, append-only storage, lineage/diff metadata, cross-artifact digests (SBOM/VEX/policy/tool), UI/CLI surfacing of full/short IDs, shard/tenant context, pin/audit governance, retention/tombstones, and inclusion in offline kits. |
|
||||
| 64 | EXPLAIN-GAPS-401-064 | TODO | None; informs tasks 13-15, 21, 47. | Policy Guild - UI/CLI Guild - Docs Guild - Signals Guild | Address explainability gaps EX1-EX10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: schema/canonicalization + hashes, DSSE predicate/signing policy, CAS storage rules for evidence, link to decision/policy and graph_revision_id, export/replay bundle format, PII/redaction rules, size budgets, versioning, and golden fixtures/tests. |
|
||||
| 65 | EDGE-GAPS-401-065 | TODO | None; informs tasks 1, 15, 47. | Scanner Guild - Policy Guild - UI/CLI Guild - Docs Guild | Address edge explainability gaps EG1-EG10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: reason enum governance, canonical edge schema with hash rules, evidence limits/redaction, confidence rubric, detector/rule provenance, API/CLI parity, deterministic fixtures, propagation into explanation graphs/VEX, localization guidance, and backfill plan. |
|
||||
| 66 | BINARY-GAPS-401-066 | TODO | None; informs tasks 12-14, 53-55. | Scanner Guild - Attestor Guild - Policy Guild | Address binary reachability gaps BR1-BR10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: canonical DSSE/predicate schemas, edge hash recipe, required binary evidence with CAS refs, build-id/variant rules, policy hash governance, Sigstore bundle/log routing, idempotent submission keys, size/chunking limits, API/CLI/UI surfacing, and binary fixtures. |
|
||||
| 63 | GRAPHREV-GAPS-401-063 | DONE (2025-12-13) | Complete: Created `docs/reachability/graph-revision-schema.md` addressing all 10 gaps (GR1-GR10): manifest schema + canonical hash rules, BLAKE3-256 encoding, append-only storage layout, lineage/diff metadata format, cross-artifact digests (SBOM/VEX/policy/tool), UI/CLI full/short ID formats + commands, shard/tenant context, pin/audit governance with events, retention/tombstone policies, and offline kit inclusion with Rekor checkpoints. | Platform Guild - Scanner Guild - Policy Guild - UI/CLI Guilds | Address graph revision gaps GR1-GR10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: manifest schema + canonical hash rules, mandated BLAKE3-256 encoding, append-only storage, lineage/diff metadata, cross-artifact digests (SBOM/VEX/policy/tool), UI/CLI surfacing of full/short IDs, shard/tenant context, pin/audit governance, retention/tombstones, and inclusion in offline kits. |
|
||||
| 64 | EXPLAIN-GAPS-401-064 | DONE (2025-12-13) | Complete: Created `docs/reachability/explainability-schema.md` addressing all 10 gaps (EX1-EX10): canonical explanation schema + hash rules, DSSE predicate `stella.ops/explanation@v1` + signing policy, CAS storage layout + rules, link format for decision/policy/graph_revision_id, export/replay bundle format with verification, PII/redaction categories + metadata, size budgets with truncation behavior, schema versioning + migration support, golden fixture locations + test categories + CI integration, and determinism guarantees. | Policy Guild - UI/CLI Guild - Docs Guild - Signals Guild | Address explainability gaps EX1-EX10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: schema/canonicalization + hashes, DSSE predicate/signing policy, CAS storage rules for evidence, link to decision/policy and graph_revision_id, export/replay bundle format, PII/redaction rules, size budgets, versioning, and golden fixtures/tests. |
|
||||
| 65 | EDGE-GAPS-401-065 | DONE (2025-12-13) | Complete: Created `docs/reachability/edge-explainability-schema.md` addressing all 10 gaps (EG1-EG10): reason enum registry with governance rules, canonical edge schema + hash computation using from/to/kind/reason, evidence limits (10 entries) + redaction rules, confidence rubric (certain/high/medium/low/unknown) with base scores per reason, detector/rule provenance schema with input artifact digests, API endpoints + CLI commands with output parity, deterministic fixture locations + requirements, propagation format for explanation graphs + VEX evidence, message catalog structure for localization, and backfill strategy + migration script. | Scanner Guild - Policy Guild - UI/CLI Guild - Docs Guild | Address edge explainability gaps EG1-EG10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: reason enum governance, canonical edge schema with hash rules, evidence limits/redaction, confidence rubric, detector/rule provenance, API/CLI parity, deterministic fixtures, propagation into explanation graphs/VEX, localization guidance, and backfill plan. |
|
||||
| 66 | BINARY-GAPS-401-066 | DONE (2025-12-13) | Complete: Created `docs/reachability/binary-reachability-schema.md` addressing all 10 gaps (BR1-BR10): canonical DSSE predicates (`stella.ops/binaryGraph@v1`, `stella.ops/binaryEdgeBundle@v1`), edge hash recipe including binary_hash context, required binary evidence table with CAS refs (`cas://binary/blocks|disasm|cfg|symbols`), build-id/variant rules for ELF/PE/Mach-O with fallback, policy hash governance with strict/forward/any binding modes, Sigstore bundle/log routing with offline mode, idempotent submission keys with tenant/binary/graph/hour granularity, size/chunking limits (10MB graph, 512 edges, 1MB DSSE, 100KB Rekor), API endpoints + CLI commands + UI component guidance, and binary fixtures with test categories. | Scanner Guild - Attestor Guild - Policy Guild | Address binary reachability gaps BR1-BR10 from `docs/product-advisories/31-Nov-2025 FINDINGS.md`: canonical DSSE/predicate schemas, edge hash recipe, required binary evidence with CAS refs, build-id/variant rules, policy hash governance, Sigstore bundle/log routing, idempotent submission keys, size/chunking limits, API/CLI/UI surfacing, and binary fixtures. |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|
||||
@@ -127,9 +127,9 @@
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | Capture checkpoint dates after Sprint 0400 closure signal. | Planning | 2025-12-15 | TODO | Waiting on Sprint 0400 readiness update. |
|
||||
| 1 | Capture checkpoint dates after Sprint 0400 closure signal. | Planning | 2025-12-15 | DONE (2025-12-13) | Sprint 0400 archived sprint indicates closed (2025-12-11); checkpoints captured and reflected under Upcoming Checkpoints. |
|
||||
| 2 | Confirm CAS hash alignment (BLAKE3 + sha256 addressing) across Scanner/Replay/Signals. | Platform Guild | 2025-12-10 | DONE (2025-12-10) | CONTRACT-RICHGRAPH-V1-015 adopted; BLAKE3 graph_hash live in Scanner/Replay per GRAPH-CAS-401-001. |
|
||||
| 3 | Schedule richgraph-v1 schema/hash alignment and rebaseline sprint dates. | Planning - Platform Guild | 2025-12-15 | TODO (slipped) | Rebaseline sprint dates after 2025-12-10 alignment; align with new checkpoints on 2025-12-15/18. |
|
||||
| 3 | Schedule richgraph-v1 schema/hash alignment and rebaseline sprint dates. | Planning - Platform Guild | 2025-12-15 | DONE (2025-12-12) | Rebaselined checkpoints post 2025-12-10 alignment; updated 2025-12-15/18 readiness reviews (see Execution Log 2025-12-12). |
|
||||
| 4 | Signals ingestion/probe readiness checkpoint for tasks 8-10, 17-18. | Signals Guild - Planning | 2025-12-18 | TODO | Assess runtime ingestion/probe readiness and flip task statuses to DOING/BLOCKED accordingly. |
|
||||
|
||||
## Decisions & Risks
|
||||
@@ -153,8 +153,20 @@
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-13 | Documented designs for DOING tasks (37, 38, 48, 58, 59): (1) v1 formal 7-state lattice model with join/meet rules at `docs/reachability/lattice.md` §9; (2) U4 tier and T1-T4 formalized tier definitions at `docs/uncertainty/README.md` §1.1, §5-7; (3) policy gate specification with three gate types at `docs/reachability/policy-gate.md`; (4) ground truth schema for test datasets at `docs/reachability/ground-truth-schema.md`; (5) native callgraph schema alignment with richgraph-v1 at `docs/modules/scanner/design/native-reachability-plan.md` §8. All designs synchronized with existing contracts (richgraph-v1, evidence-schema). Implementation pending for all. | Implementer |
|
||||
| 2025-12-13 | Marked SCANNER-NATIVE-401-015, GAP-REP-004, SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and GRAPH-HYBRID-401-053 as BLOCKED pending contracts on native lifters/toolchains, replay manifest v2 acceptance vectors/CAS gates, cross-RID build-id/code_id propagation, init synthetic-root schema/oracles, and graph-level DSSE/Rekor budget + golden fixtures. | Planning |
|
||||
| 2025-12-13 | Unblocked tasks 40/41/52: (1) Task 40 (UNCERTAINTY-POLICY-401-026) now TODO - dependencies 38/39 complete with UncertaintyTier (T1-T4) and entropy-aware scoring. (2) Task 41 (UNCERTAINTY-UI-401-027) now TODO - same dependencies. (3) Task 52 (QA-PORACLE-401-037) now TODO - dependencies 1/53 complete with richgraph-v1 schema and graph-level DSSE. | Implementer |
|
||||
| 2025-12-13 | Completed CORPUS-MERGE-401-060: migrated `tests/reachability/corpus` from legacy `expect.yaml` to `ground-truth.json` (Reachbench truth schema v1) with updated deterministic manifest generator (`tests/reachability/scripts/update_corpus_manifest.py`) and fixture validation (`tests/reachability/StellaOps.Reachability.FixtureTests/CorpusFixtureTests.cs`). Added cross-dataset coverage gates (`tests/reachability/StellaOps.Reachability.FixtureTests/FixtureCoverageTests.cs`), a deterministic manifest runner for corpus + public samples + reachbench (`tests/reachability/runners/run_all.{sh,ps1}`), and updated corpus map documentation (`docs/reachability/corpus-plan.md`). Fixture tests passing. | Implementer |
|
||||
| 2025-12-13 | Started CORPUS-MERGE-401-060: unifying `tests/reachability/corpus` and `tests/reachability/samples-public` on a single ground-truth/manifest contract, adding deterministic runners + coverage gates, and updating `docs/reachability/corpus-plan.md`. | Implementer |
|
||||
| 2025-12-13 | Completed GRAPH-HYBRID-401-053: richgraph CAS publisher now stores canonical JSON bodies and emits deterministic graph DSSE envelopes under `cas://reachability/graphs/{blake3Hex}.dsse`; `RichGraphPublishResult` includes DSSE pointers and tests validate the DSSE payload/signature (`src/Scanner/__Libraries/StellaOps.Scanner.Reachability/ReachabilityRichGraphPublisher.cs`, `src/Scanner/__Tests/StellaOps.Scanner.Reachability.Tests/RichGraphPublisherTests.cs`). | Implementer |
|
||||
| 2025-12-13 | Completed SIG-STORE-401-016 and UNCERTAINTY-SCORER-401-025: added shared reachability store (func_nodes/call_edges/cve_func_hits) repository APIs + Mongo index script (`ops/mongo/indices/reachability_store_indices.js`), integrated store population during callgraph ingestion, and aligned entropy-aware risk scoring so `ReachabilityFactDocument.RiskScore` matches `UncertaintyDocument.RiskScore` with configurable weights; Signals + reachability integration tests passing. | Implementer |
|
||||
| 2025-12-13 | Started SIG-STORE-401-016 and UNCERTAINTY-SCORER-401-025: implementing reachability store collections/indexes + repository APIs and entropy-aware risk scoring in `src/Signals/StellaOps.Signals`. | Implementer |
|
||||
| 2025-12-13 | Completed GAP-REP-004: Implemented replay manifest v2 in `src/__Libraries/StellaOps.Replay.Core`. (1) Added `hash` field with algorithm prefix (blake3:..., sha256:...) to ReplayManifest.cs. (2) Added `code_id_coverage` section for stripped binary handling. (3) Created `ICasValidator` interface and `InMemoryCasValidator` for CAS reference validation. (4) Created `ReplayManifestValidator` with error codes per acceptance contract (MISSING_VERSION, VERSION_MISMATCH, MISSING_HASH_ALG, UNSORTED_ENTRIES, CAS_NOT_FOUND, HASH_MISMATCH). (5) Added `UpgradeToV2` migration helper. (6) Added 18 tests covering all v2 acceptance vectors. Also unblocked Task 18 (SIG-STORE-401-016). | Implementer |
|
||||
| 2025-12-13 | Unblocked tasks 19/26/39/53/60: (1) Created `docs/replay/replay-manifest-v2-acceptance.md` with acceptance vectors, CAS registration gates, test fixtures, and migration path for Task 19. (2) Updated `bench/README.md` with verification workflows, artifact contracts, and CI integration for Task 26 (DONE). (3) Frozen section 8 of `docs/reachability/hybrid-attestation.md` with DSSE/Rekor budget by tier, CAS signing layout, CLI UX, and golden fixture plan for Task 53. (4) Marked Tasks 39 and 60 as TODO since their dependencies (38 and 58) are complete. | Docs Guild |
|
||||
| 2025-12-13 | Completed BINARY-GAPS-401-066: Created `docs/reachability/binary-reachability-schema.md` addressing all 10 binary reachability gaps (BR1-BR10) from November 2025 product findings. Document specifies: DSSE predicates (`stella.ops/binaryGraph@v1`, `stella.ops/binaryEdgeBundle@v1`), edge hash recipe with binary_hash context, required evidence table with CAS refs, build-id/variant rules for ELF/PE/Mach-O, policy hash governance with binding modes, Sigstore routing with offline mode, idempotent submission keys, size/chunking limits, API/CLI/UI guidance, and binary fixture requirements with test categories. | Docs Guild |
|
||||
| 2025-12-13 | Completed tasks 37/38/48/58/59: implemented reachability lattice + uncertainty tiers + policy gate evaluator, published ground-truth schema/tests, and added richgraph-v1 native alignment tests; docs synced (`docs/reachability/lattice.md`, `docs/uncertainty/README.md`, `docs/reachability/policy-gate.md`, `docs/reachability/ground-truth-schema.md`, `docs/modules/scanner/design/native-reachability-plan.md`). | Implementer |
|
||||
| 2025-12-13 | Regenerated deterministic reachbench/corpus manifest hashes with offline scripts (`tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`, `tests/reachability/corpus/update_manifest.py`) and verified reachability test suites (Policy Engine, Scanner Reachability, FixtureTests, Signals Reachability, ScannerSignals Integration) passing. | Implementer |
|
||||
| 2025-12-13 | Closed Action Tracker items #1/#3: captured Sprint 0400 closure signal (archived sprint closed 2025-12-11) and marked richgraph alignment/rebaseline action complete. | Planning |
|
||||
| 2025-12-13 | Started GAP-REP-004: implementing replay manifest v2 acceptance contract (hash fields, CAS registration gates, deterministic vectors) per `docs/replay/replay-manifest-v2-acceptance.md`. | Implementer |
|
||||
| 2025-12-13 | Marked SCANNER-NATIVE-401-015, SCANNER-BUILDID-401-035, and SCANNER-INITROOT-401-036 as BLOCKED pending contracts on native lifters/toolchains, cross-RID build-id/code_id propagation, and init synthetic-root schema/oracles. | Planning |
|
||||
| 2025-12-12 | Normalized sprint header/metadata formatting and aligned Action Tracker status labels to `TODO`/`DONE`; no semantic changes. | Project Mgmt |
|
||||
| 2025-12-12 | Rebaselined reachability wave: marked tasks 6/8/13-18/20-21/23/25-26/39-41/46-47/52/54-56/60 as BLOCKED pending upstream deps; set Wave 0401 status to DOING post richgraph alignment so downstream work can queue cleanly. | Planning |
|
||||
| 2025-12-12 | RecordModeService bumped to replay manifest v2 (hashAlg fields, BLAKE3 graph hashes) and ReachabilityReplayWriter now emits hashAlg for graphs/traces; added synthetic runtime probe endpoint to Signals with deterministic builder + tests. | Implementer |
|
||||
|
||||
@@ -23,30 +23,30 @@
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | SCAN-JAVA-403-001 | TODO | Decide nested locator scheme (Action 1), then implement. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Scan embedded libraries inside archives**: extend `JavaLanguageAnalyzer` to enumerate and parse Maven coordinates from embedded JARs in `BOOT-INF/lib/**.jar`, `WEB-INF/lib/**.jar`, `APP-INF/lib/**.jar`, and `lib/**.jar` *without extracting to disk*. Emit one component per discovered embedded artifact (PURL-based when possible). Evidence locators must represent nesting deterministically (e.g., `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`). Enforce size/time bounds (skip embedded jars above a configured size threshold; record `embeddedScanSkipped=true` + reason metadata). |
|
||||
| 2 | SCAN-JAVA-403-002 | TODO | After task 1 skeleton lands, add `pom.xml` fallback and coverage fixtures. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Add `pom.xml` fallback when `pom.properties` is missing**: detect and parse `META-INF/maven/**/pom.xml` (both top-level archives and embedded jars). Prefer `pom.properties` when both exist; otherwise derive `groupId/artifactId/version/packaging/name` from `pom.xml` and emit `pkg:maven/...` PURLs. Evidence must include sha256 of the parsed `pom.xml` entry. If `pom.xml` is present but coordinates are incomplete, emit a component with explicit key (no PURL) carrying `manifestTitle/manifestVersion` and an `unresolvedCoordinates=true` marker (do not guess a Maven PURL). |
|
||||
| 3 | SCAN-JAVA-403-003 | TODO | Requires agreement on multi-module precedence (Interlock 2). | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Parse all discovered Gradle lockfiles deterministically**: update `JavaLockFileCollector` to parse lockfiles from `JavaBuildFileDiscovery` results (not only root `gradle.lockfile` and `gradle/dependency-locks`). Preserve the lockfile-relative path as `lockLocator` and include module context in metadata (e.g., `lockModulePath`). Deduplicate identical GAVs deterministically (stable overwrite rules documented in code + tested). |
|
||||
| 4 | SCAN-JAVA-403-004 | TODO | Decide runtime component identity strategy (Action 2). | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Emit runtime image components**: when `JavaWorkspaceNormalizer` identifies a runtime image, emit a `java-runtime` component (explicit key or PURL per decision) with metadata `java.version`, `java.vendor`, and `runtimeImagePath` (relative). Evidence must reference the `release` file. Ensure deterministic ordering and do not double-count multiple identical runtime images (same version+vendor+relative path). |
|
||||
| 5 | SCAN-JAVA-403-005 | TODO | After task 1 or 2, wire bytecode JNI analysis once per scan. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Replace naive JNI string scanning with bytecode-based JNI analysis**: integrate `Internal/Jni/JavaJniAnalyzer` into `JavaLanguageAnalyzer` so JNI usage metadata is derived from parsed method invocations and native method flags (not raw ASCII search). Output must be bounded and deterministic: emit counts + top-N stable samples (e.g., `jni.edgeCount`, `jni.targetLibraries`, `jni.reasons`). Do not emit full class lists unbounded. |
|
||||
| 6 | SCAN-JAVA-403-006 | TODO | Parallel with tasks 1–5; keep fixtures minimal. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Java.Tests`) | **Add fixtures + golden outputs for new detection paths**: introduce fixtures covering (a) fat JAR with embedded libs under `BOOT-INF/lib`, (b) WAR with embedded libs under `WEB-INF/lib`, (c) artifact containing only `pom.xml` (no `pom.properties`), (d) multi-module Gradle lockfile layout, and (e) runtime image directory with `release`. Add/extend `JavaLanguageAnalyzerTests.cs` golden harness assertions proving embedded components are emitted with correct nested locators and stable ordering. |
|
||||
| 7 | SCAN-JAVA-403-007 | TODO | After tasks 1–2 land, wire perf guard. | Bench Guild (`src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Add benchmark scenario for fat-archive scanning**: add a deterministic bench case that scans a representative fat JAR fixture and reports component count + elapsed time. Establish a baseline ceiling and ensure CI can run it offline. |
|
||||
| 8 | SCAN-JAVA-403-008 | TODO | After tasks 1–5 land, document final contract. | Docs Guild + Java Analyzer Guild (`docs/modules/scanner`, `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Document Java analyzer detection contract**: update `docs/modules/scanner/architecture.md` (or add a Java analyzer sub-doc under `docs/modules/scanner/`) describing: embedded jar scanning rules, nested evidence locator format, lock precedence rules, runtime component emission, JNI metadata semantics, and known limitations (e.g., shaded jars with stripped Maven metadata remain best-effort). Link this sprint from the doc’s “evidence & determinism” area. |
|
||||
| 1 | SCAN-JAVA-403-001 | DONE | Embedded scan ships with bounds + nested locators; fixtures/goldens in task 6 validate. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Scan embedded libraries inside archives**: extend `JavaLanguageAnalyzer` to enumerate and parse Maven coordinates from embedded JARs in `BOOT-INF/lib/**.jar`, `WEB-INF/lib/**.jar`, `APP-INF/lib/**.jar`, and `lib/**.jar` *without extracting to disk*. Emit one component per discovered embedded artifact (PURL-based when possible). Evidence locators must represent nesting deterministically (e.g., `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`). Enforce size/time bounds (skip embedded jars above a configured size threshold; record `embeddedScanSkipped=true` + reason metadata). |
|
||||
| 2 | SCAN-JAVA-403-002 | DONE | `pom.xml` fallback implemented for archives + embedded jars; explicit-key unresolved when incomplete. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Add `pom.xml` fallback when `pom.properties` is missing**: detect and parse `META-INF/maven/**/pom.xml` (both top-level archives and embedded jars). Prefer `pom.properties` when both exist; otherwise derive `groupId/artifactId/version/packaging/name` from `pom.xml` and emit `pkg:maven/...` PURLs. Evidence must include sha256 of the parsed `pom.xml` entry. If `pom.xml` is present but coordinates are incomplete, emit a component with explicit key (no PURL) carrying `manifestTitle/manifestVersion` and an `unresolvedCoordinates=true` marker (do not guess a Maven PURL). |
|
||||
| 3 | SCAN-JAVA-403-003 | BLOCKED | Needs an explicit, documented precedence rule for multi-module lock sources (Interlock 2). | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Parse all discovered Gradle lockfiles deterministically**: update `JavaLockFileCollector` to parse lockfiles from `JavaBuildFileDiscovery` results (not only root `gradle.lockfile` and `gradle/dependency-locks`). Preserve the lockfile-relative path as `lockLocator` and include module context in metadata (e.g., `lockModulePath`). Deduplicate identical GAVs deterministically (stable overwrite rules documented in code + tested). |
|
||||
| 4 | SCAN-JAVA-403-004 | BLOCKED | Needs runtime component identity decision (Action 2) to avoid false vuln matches. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Emit runtime image components**: when `JavaWorkspaceNormalizer` identifies a runtime image, emit a `java-runtime` component (explicit key or PURL per decision) with metadata `java.version`, `java.vendor`, and `runtimeImagePath` (relative). Evidence must reference the `release` file. Ensure deterministic ordering and do not double-count multiple identical runtime images (same version+vendor+relative path). |
|
||||
| 5 | SCAN-JAVA-403-005 | DONE | Bytecode JNI metadata integrated and bounded; tests updated. | Java Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Replace naive JNI string scanning with bytecode-based JNI analysis**: integrate `Internal/Jni/JavaJniAnalyzer` into `JavaLanguageAnalyzer` so JNI usage metadata is derived from parsed method invocations and native method flags (not raw ASCII search). Output must be bounded and deterministic: emit counts + top-N stable samples (e.g., `jni.edgeCount`, `jni.targetLibraries`, `jni.reasons`). Do not emit full class lists unbounded. |
|
||||
| 6 | SCAN-JAVA-403-006 | BLOCKED | Embedded/pomxml goldens landed; lock+runtime fixtures await tasks 3/4 decisions. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Java.Tests`) | **Add fixtures + golden outputs for new detection paths**: introduce fixtures covering (a) fat JAR with embedded libs under `BOOT-INF/lib`, (b) WAR with embedded libs under `WEB-INF/lib`, (c) artifact containing only `pom.xml` (no `pom.properties`), (d) multi-module Gradle lockfile layout, and (e) runtime image directory with `release`. Add/extend `JavaLanguageAnalyzerTests.cs` golden harness assertions proving embedded components are emitted with correct nested locators and stable ordering. |
|
||||
| 7 | SCAN-JAVA-403-007 | DONE | Added `java_fat_archive` scenario + fixture `samples/runtime/java-fat-archive`; baseline row pending in follow-up. | Bench Guild (`src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Add benchmark scenario for fat-archive scanning**: add a deterministic bench case that scans a representative fat JAR fixture and reports component count + elapsed time. Establish a baseline ceiling and ensure CI can run it offline. |
|
||||
| 8 | SCAN-JAVA-403-008 | DONE | Added Java analyzer contract doc + linked from scanner architecture; cross-analyzer contract cleaned. | Docs Guild + Java Analyzer Guild (`docs/modules/scanner`, `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java`) | **Document Java analyzer detection contract**: update `docs/modules/scanner/architecture.md` (or add a Java analyzer sub-doc under `docs/modules/scanner/`) describing: embedded jar scanning rules, nested evidence locator format, lock precedence rules, runtime component emission, JNI metadata semantics, and known limitations (e.g., shaded jars with stripped Maven metadata remain best-effort). Link this sprint from the doc's `evidence & determinism` area. |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| A: Embedded Inventory | Java Analyzer Guild + QA Guild | Locator decision (Action 1) | TODO | Enables detection of fat JAR/WAR embedded libs. |
|
||||
| B: Coordinates Fallback | Java Analyzer Guild + QA Guild | None | TODO | `pom.xml` fallback for Maven coordinates when properties missing. |
|
||||
| C: Lock Coverage | Java Analyzer Guild + QA Guild | Precedence decision (Interlock 2) | TODO | Multi-module Gradle lock ingestion improvements. |
|
||||
| D: Runtime & JNI Context | Java Analyzer Guild + QA Guild | Runtime identity decision (Action 2) | TODO | Runtime component emission + JNI bytecode integration. |
|
||||
| E: Bench & Docs | Bench Guild + Docs Guild | Waves A–D | TODO | Perf ceiling + contract documentation. |
|
||||
| A: Embedded Inventory | Java Analyzer Guild + QA Guild | Locator decision (Action 1) | DOING | Enables detection of fat JAR/WAR embedded libs. |
|
||||
| B: Coordinates Fallback | Java Analyzer Guild + QA Guild | None | DOING | `pom.xml` fallback for Maven coordinates when properties missing. |
|
||||
| C: Lock Coverage | Java Analyzer Guild + QA Guild | Precedence decision (Interlock 2) | BLOCKED | Multi-module Gradle lock ingestion improvements. |
|
||||
| D: Runtime & JNI Context | Java Analyzer Guild + QA Guild | Runtime identity decision (Action 2) | DOING | JNI bytecode integration in progress; runtime emission blocked. |
|
||||
| E: Bench & Docs | Bench Guild + Docs Guild | Waves A-D | TODO | Perf ceiling + contract documentation. |
|
||||
|
||||
## Wave Detail Snapshots
|
||||
- **Wave A:** Embedded JAR enumeration + nested evidence locators; fixtures prove fat-archive dependency visibility.
|
||||
- **Wave B:** `pom.xml` fallback emits Maven PURLs when properties missing; explicit-key “unknown coords” component when insufficient data.
|
||||
- **Wave B:** `pom.xml` fallback emits Maven PURLs when properties missing; explicit-key `unknown coords` component when insufficient data.
|
||||
- **Wave C:** Broader Gradle lock ingestion across multi-module layouts; deterministic de-dupe rules and module-context metadata.
|
||||
- **Wave D:** Runtime image component emitted from `release`; JNI metadata uses bytecode parsing with bounded output.
|
||||
- **Wave E:** Offline benchmark + documented “what the analyzer promises” contract.
|
||||
- **Wave E:** Offline benchmark + documented `what the analyzer promises` contract.
|
||||
|
||||
## Interlocks
|
||||
- Evidence locator format must be stable across analyzers and safe for downstream consumers (CLI/UI/export). (Action 1)
|
||||
@@ -64,23 +64,28 @@
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | Decide and document nested evidence locator scheme for embedded JAR entries (`outer!inner!path`). | Project Mgmt + Java Analyzer Guild | 2025-12-13 | Open | Must be stable, deterministic, and parseable by exporters. |
|
||||
| 1 | Decide and document nested evidence locator scheme for embedded JAR entries (`outer!inner!path`). | Project Mgmt + Java Analyzer Guild | 2025-12-13 | Implemented (pending approval) | Implemented via nested `!` locators (consistent with existing `BuildLocator`); covered by new goldens. |
|
||||
| 2 | Decide runtime component identity approach (explicit key vs PURL scheme; if PURL, specify qualifiers). | Project Mgmt + Scanner Guild | 2025-12-13 | Open | Avoid false vuln matches; prefer explicit-key if uncertain. |
|
||||
| 3 | Define embedded-scan bounds (max embedded jars per archive, max embedded jar size) and required metadata when skipping. | Java Analyzer Guild + Security Guild | 2025-12-13 | Open | Must prevent resource exhaustion from untrusted artifacts. |
|
||||
| 3 | Define embedded-scan bounds (max embedded jars per archive, max embedded jar size) and required metadata when skipping. | Java Analyzer Guild + Security Guild | 2025-12-13 | DONE | Implemented hard bounds + deterministic skip markers; documented in `docs/modules/scanner/analyzers-java.md`. |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision (pending):** Embedded locator format and runtime identity strategy (see Action Tracker 1–2).
|
||||
- **Decision (pending):** Embedded locator format and runtime identity strategy (see Action Tracker 1-2).
|
||||
- **Note:** This sprint proceeds using the existing Java analyzer locator convention (`archiveRelativePath!entryPath`), extended by nesting additional `!` separators for embedded jars.
|
||||
- **Note:** Unresolved `pom.xml` coordinates emit an explicit-key component via `LanguageExplicitKey.Create("java","maven",...)` with `purl=null` and `version=null` (metadata still carries `manifestVersion`).
|
||||
- **Blockers:** `SCAN-JAVA-403-003` (lock precedence) and `SCAN-JAVA-403-004` (runtime identity).
|
||||
|
||||
| Risk ID | Risk | Impact | Likelihood | Mitigation | Owner | Trigger / Signal |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
| R1 | Embedded jar scanning increases CPU/memory and can be abused by large payloads. | High | Medium | Hard limits + streaming where possible; deterministic skip markers; add perf bench. | Java Analyzer Guild | Bench regression; OOM/timeout in CI; unusually large jar fixtures. |
|
||||
| R2 | Nested locator format breaks downstream tooling expectations (export/UI). | Medium | Medium | Decide format up-front; add tests that assert exact locator strings; document contract. | Project Mgmt | Export bundle consumers fail parsing; UI shows confusing paths. |
|
||||
| R3 | `pom.xml` parsing yields partial/incorrect coordinates (parent inheritance not available). | Medium | Medium | Only emit Maven PURL when `groupId/artifactId/version` are present; otherwise explicit-key component with `unresolvedCoordinates=true`. | Java Analyzer Guild | Golden fixtures show non-deterministic/missing coordinates. |
|
||||
| R4 | Multi-module lock ingestion causes duplicate “declared-only” components or unstable overwrite rules. | Medium | Medium | Define precedence; stable sort and deterministic overwrite; fixture covering duplicates. | Java Analyzer Guild | Flaky tests; differing outputs depending on directory order. |
|
||||
| R5 | Runtime “PURL” choice creates false vuln matches for Java runtimes. | High | Low/Medium | Prefer explicit-key component unless a vetted PURL scheme is agreed. | Scanner Guild | Vuln matches spike for runtime-only components without evidence. |
|
||||
| R4 | Multi-module lock ingestion causes duplicate `declared-only` components or unstable overwrite rules. | Medium | Medium | Define precedence; stable sort and deterministic overwrite; fixture covering duplicates. | Java Analyzer Guild | Flaky tests; differing outputs depending on directory order. |
|
||||
| R5 | Runtime `PURL` choice creates false vuln matches for Java runtimes. | High | Low/Medium | Prefer explicit-key component unless a vetted PURL scheme is agreed. | Scanner Guild | Vuln matches spike for runtime-only components without evidence. |
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-12 | Sprint created to close Java analyzer detection gaps (embedded libs, `pom.xml` fallback, lock coverage, runtime images, JNI integration) with fixtures/bench/docs expectations. | Project Mgmt |
|
||||
| 2025-12-13 | Set tasks 1/2/5 to DOING; marked tasks 3/4 BLOCKED pending precedence/runtime identity decisions; started implementation work. | Java Analyzer Guild |
|
||||
| 2025-12-13 | DONE: embedded jar scan + `pom.xml` fallback + JNI bytecode metadata; added goldens for fat JAR/WAR/pomxml-only; added bench scenario + Java analyzer contract docs; task 6 remains BLOCKED on tasks 3/4. | Java Analyzer Guild |
|
||||
|
||||
|
||||
@@ -20,14 +20,14 @@
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | SCAN-PY-405-001 | TODO | Approve identity/precedence rules (Actions 1–2). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Wire layout-aware discovery into `PythonLanguageAnalyzer`**: stop treating “any `*.dist-info` anywhere” as an installed package source. Use `PythonInputNormalizer` + `PythonVirtualFileSystem` + `PythonPackageDiscovery` as the first-pass inventory (site-packages, editable paths, wheels, zipapps, container layer roots). Ensure deterministic path precedence (later/higher-confidence wins) and bounded scanning (no unbounded full-tree recursion for patterns). Emit package-kind + confidence metadata (`pkg.kind`, `pkg.confidence`, `pkg.location`) for every component. |
|
||||
| 2 | SCAN-PY-405-002 | TODO | After task 1, define dist-info/egg-info enrichment rules. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Preserve dist-info “deep evidence” while expanding coverage**: for any discovered package with a real `*.dist-info`/`*.egg-info`, continue to enrich with `PythonDistributionLoader` evidence (METADATA/RECORD/WHEEL/entrypoints, RECORD verification stats). For packages discovered without dist-info (e.g., Poetry editable, vendored, zipapp), emit components using `AddFromExplicitKey` with stable identity rules (Action 1) and evidence pointing to the originating file(s) (`pyproject.toml`, lockfile, archive path). |
|
||||
| 3 | SCAN-PY-405-003 | TODO | Decide lock precedence + supported formats scope (Action 2). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Expand lockfile/requirements detection and parsing**: upgrade `PythonLockFileCollector` to (a) discover lock/requirements files deterministically (root + nested common paths), (b) support `-r/--requirement` includes with cycle detection, (c) correctly handle editable `-e/--editable` lines, (d) parse PEP 508 specifiers (not only `==/===`) and `name @ url` direct references, and (e) include Pipenv `develop` section. Add opt-in support for at least one modern lock (`uv.lock` or `pdm.lock`) with deterministic record ordering and explicit “unsupported line” counters. |
|
||||
| 4 | SCAN-PY-405-004 | TODO | Requires container overlay decision (Action 3). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Correct container-layer inventory semantics**: when scanning raw OCI layer trees (`layers/`, `.layers/`, `layer*/`), honor whiteouts/overlay ordering so removed packages are not reported. Use/extend `Internal/Packaging/Adapters/ContainerLayerAdapter` semantics as the source of truth for precedence. Emit explicit metadata markers when inventory is partial due to missing overlay context (e.g., `container.overlayIncomplete=true`). |
|
||||
| 5 | SCAN-PY-405-005 | TODO | Decide representation for vendored deps (Action 4). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Surface vendored (bundled) Python deps**: integrate `VendoredPackageDetector` so known vendoring patterns (`*_vendor`, `third_party`, `requests.packages`, etc.) are detected. Emit either (a) separate “embedded” components with bounded evidence locators (preferred) or (b) a bounded metadata summary on the parent package (`vendored.detected=true`, `vendored.packages`, `vendored.paths`). Never emit unbounded file/module lists; cap to top-N deterministic samples. |
|
||||
| 6 | SCAN-PY-405-006 | TODO | After task 1–3, decide “used-by-entrypoint” upgrade approach (Interlock 4). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Improve “used by entrypoint” and scope classification**: today `usedByEntrypoint` primarily comes from RECORD/script hints. Extend this by optionally mapping source-tree imports (`PythonImportAnalysis`) and/or runtime evidence (`PythonRuntimeEvidenceCollector`) to packages (via `TopLevelModules`) so “likely used” can be signaled deterministically (bounded, opt-in). Add `scope` metadata using `PythonScopeClassifier` (prod/dev/docs/build) based on lock sections and requirements file names. |
|
||||
| 7 | SCAN-PY-405-007 | TODO | Parallel with tasks 1–6; fixtures first. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests`) | **Fixtures + golden outputs**: add fixtures proving new detection paths: (a) conda env (`conda-meta/*.json`) without dist-info, (b) requirements with `-r` includes + `-e .` editable, (c) Pipfile.lock with `default` + `develop`, (d) wheel file in workspace (no extraction), (e) zipapp/pyz with embedded requirements, (f) container layers with whiteouts hiding a dist-info dir, (g) vendored dependency directory under a package. Extend `PythonLanguageAnalyzerTests.cs` to assert deterministic ordering, stable identities, and bounded metadata. |
|
||||
| 8 | SCAN-PY-405-008 | TODO | After core behavior lands, update docs + perf guard. | Docs Guild + Bench Guild (`docs/modules/scanner`, `src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Document + benchmark Python analyzer contract**: update `docs/modules/scanner/architecture.md` (or add a Python analyzer sub-doc) describing detection sources & precedence, lock parsing rules, container overlay semantics, vendoring representation, and identity rules for non-versioned components. Add a deterministic offline bench scanning a representative fixture (many packages + lockfiles) and record baseline ceilings (time + components count). |
|
||||
| 1 | SCAN-PY-405-001 | DONE | Implement VFS/discovery pipeline; then codify identity/precedence in tests. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Wire layout-aware discovery into `PythonLanguageAnalyzer`**: stop treating "any `*.dist-info` anywhere" as an installed package source. Use `PythonInputNormalizer` + `PythonVirtualFileSystem` + `PythonPackageDiscovery` as the first-pass inventory (site-packages, editable paths, wheels, zipapps, container layer roots). Ensure deterministic path precedence (later/higher-confidence wins) and bounded scanning (no unbounded full-tree recursion for patterns). Emit package-kind + confidence metadata (`pkg.kind`, `pkg.confidence`, `pkg.location`) for every component. |
|
||||
| 2 | SCAN-PY-405-002 | BLOCKED | Blocked on Action 1 identity scheme for non-versioned explicit keys. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Preserve dist-info "deep evidence" while expanding coverage**: for any discovered package with a real `*.dist-info`/`*.egg-info`, continue to enrich with `PythonDistributionLoader` evidence (METADATA/RECORD/WHEEL/entrypoints, RECORD verification stats). For packages discovered without dist-info (e.g., Poetry editable, vendored, zipapp), emit components using `AddFromExplicitKey` with stable identity rules (Action 1) and evidence pointing to the originating file(s) (`pyproject.toml`, lockfile, archive path). |
|
||||
| 3 | SCAN-PY-405-003 | BLOCKED | Await Action 2 (lock/requirements precedence + supported formats scope). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Expand lockfile/requirements detection and parsing**: upgrade `PythonLockFileCollector` to (a) discover lock/requirements files deterministically (root + nested common paths), (b) support `-r/--requirement` includes with cycle detection, (c) correctly handle editable `-e/--editable` lines, (d) parse PEP 508 specifiers (not only `==/===`) and `name @ url` direct references, and (e) include Pipenv `develop` section. Add opt-in support for at least one modern lock (`uv.lock` or `pdm.lock`) with deterministic record ordering and explicit "unsupported line" counters. |
|
||||
| 4 | SCAN-PY-405-004 | BLOCKED | Await Action 3 (container overlay handling contract). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Correct container-layer inventory semantics**: when scanning raw OCI layer trees (`layers/`, `.layers/`, `layer*/`), honor whiteouts/overlay ordering so removed packages are not reported. Use/extend `Internal/Packaging/Adapters/ContainerLayerAdapter` semantics as the source of truth for precedence. Emit explicit metadata markers when inventory is partial due to missing overlay context (e.g., `container.overlayIncomplete=true`). |
|
||||
| 5 | SCAN-PY-405-005 | BLOCKED | Await Action 4 (vendored deps representation contract). | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Surface vendored (bundled) Python deps**: integrate `VendoredPackageDetector` so known vendoring patterns (`*_vendor`, `third_party`, `requests.packages`, etc.) are detected. Emit either (a) separate "embedded" components with bounded evidence locators (preferred) or (b) a bounded metadata summary on the parent package (`vendored.detected=true`, `vendored.packages`, `vendored.paths`). Never emit unbounded file/module lists; cap to top-N deterministic samples. |
|
||||
| 6 | SCAN-PY-405-006 | BLOCKED | Await Interlock 4 decision on "used-by-entrypoint" semantics. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Improve "used by entrypoint" and scope classification**: today `usedByEntrypoint` primarily comes from RECORD/script hints. Extend this by optionally mapping source-tree imports (`PythonImportAnalysis`) and/or runtime evidence (`PythonRuntimeEvidenceCollector`) to packages (via `TopLevelModules`) so "likely used" can be signaled deterministically (bounded, opt-in). Add `scope` metadata using `PythonScopeClassifier` (prod/dev/docs/build) based on lock sections and requirements file names. |
|
||||
| 7 | SCAN-PY-405-007 | BLOCKED | Blocked on Actions 2-4 for remaining fixtures (requirements/includes/editables, whiteouts, vendoring). | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests`) | **Fixtures + golden outputs**: add fixtures proving new detection paths: (a) conda env (`conda-meta/*.json`) without dist-info, (b) requirements with `-r` includes + `-e .` editable, (c) Pipfile.lock with `default` + `develop`, (d) wheel file in workspace (no extraction), (e) zipapp/pyz with embedded requirements, (f) container layers with whiteouts hiding a dist-info dir, (g) vendored dependency directory under a package. Extend `PythonLanguageAnalyzerTests.cs` to assert deterministic ordering, stable identities, and bounded metadata. |
|
||||
| 8 | SCAN-PY-405-008 | DONE | After core behavior lands, update docs + perf guard. | Docs Guild + Bench Guild (`docs/modules/scanner`, `src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Document + benchmark Python analyzer contract**: update `docs/modules/scanner/architecture.md` (or add a Python analyzer sub-doc) describing detection sources & precedence, lock parsing rules, container overlay semantics, vendoring representation, and identity rules for non-versioned components. Add a deterministic offline bench scanning a representative fixture (many packages + lockfiles) and record baseline ceilings (time + components count). |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|
||||
@@ -67,7 +67,13 @@
|
||||
| 4 | Decide how vendored deps are represented (separate embedded components vs parent-only metadata) and how to avoid false vuln matches. | Project Mgmt + Python Analyzer Guild | 2025-12-13 | Open | Prefer separate components only when identity/version is defensible; otherwise bounded metadata summary. |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision (pending):** Identity scheme for non-versioned components, lock precedence, and container overlay expectations (Action Tracker 1–3).
|
||||
- **Decision (pending):** Identity scheme for non-versioned components, lock precedence, and container overlay expectations (Action Tracker 1-3).
|
||||
- **BLOCKED:** `SCAN-PY-405-002` needs an approved explicit-key identity scheme (Action Tracker 1) before emitting non-versioned components (vendored/local/zipapp/project).
|
||||
- **BLOCKED:** `SCAN-PY-405-003` awaits lock/requirements precedence + supported formats scope (Action Tracker 2).
|
||||
- **BLOCKED:** `SCAN-PY-405-004` awaits container overlay handling contract for raw `layers/` inputs (Action Tracker 3).
|
||||
- **BLOCKED:** `SCAN-PY-405-005` awaits vendored deps representation contract (Action Tracker 4).
|
||||
- **BLOCKED:** `SCAN-PY-405-006` awaits Interlock 4 decision on "used-by-entrypoint" semantics (avoid turning heuristics into truth).
|
||||
- **BLOCKED:** `SCAN-PY-405-007` awaits Actions 2-4 to fixture remaining semantics (includes/editables, overlay/whiteouts, vendoring).
|
||||
|
||||
| Risk ID | Risk | Impact | Likelihood | Mitigation | Owner | Trigger / Signal |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
@@ -81,4 +87,12 @@
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-12 | Sprint created to close Python analyzer detection gaps (layout-aware discovery, lockfile expansion, container overlay correctness, vendoring signals, optional usage/scope improvements) with fixtures/bench/docs expectations. | Project Mgmt |
|
||||
| 2025-12-13 | Started SCAN-PY-405-001 (wire VFS/discovery into PythonLanguageAnalyzer). | Python Analyzer Guild |
|
||||
| 2025-12-13 | Completed SCAN-PY-405-001 (layout-aware VFS-based discovery; pkg.kind/pkg.confidence/pkg.location metadata; deterministic archive roots; updated goldens + tests). | Python Analyzer Guild |
|
||||
| 2025-12-13 | Started SCAN-PY-405-002 (preserve/enrich dist-info evidence across discovered sources). | Python Analyzer Guild |
|
||||
| 2025-12-13 | Enforced identity safety for editable lock entries (explicit-key, no `@editable` PURLs, host-path scrubbing) and updated layered fixture to prove `layers/`, `.layers/`, and `layer*/` discovery. | Implementer |
|
||||
| 2025-12-13 | Added `PythonDistributionVfsLoader` for archive dist-info enrichment (RECORD verification + metadata parity for wheels/zipapps); task remains blocked on explicit-key identity scheme (Action Tracker 1). | Implementer |
|
||||
| 2025-12-13 | Marked SCAN-PY-405-003 through SCAN-PY-405-007 as `BLOCKED` pending Actions 2-4; synced statuses to `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/TASKS.md`. | Implementer |
|
||||
| 2025-12-13 | Started SCAN-PY-405-008 (document current Python analyzer contract and extend deterministic offline bench coverage). | Implementer |
|
||||
| 2025-12-13 | Completed SCAN-PY-405-008 (added Python analyzer contract doc + linked from Scanner architecture; extended analyzer microbench config and refreshed baseline; fixed Node analyzer empty-root guard to unblock bench runs from repo root). | Implementer |
|
||||
|
||||
|
||||
@@ -20,17 +20,17 @@
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | SCAN-NODE-406-001 | TODO | Decide identity/declared-only scheme (Action 1). | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Emit declared-only components**: `NodeLockData.LoadAsync` already builds `DeclaredPackages` from lockfiles + `package.json`, but `NodeLanguageAnalyzer` never emits them. Add a deterministic “declared-only emission” pass that emits components for any `DeclaredPackages` entry not backed by on-disk inventory. Must include: `declaredOnly=true`, `declared.source` (`package.json|package-lock.json|yarn.lock|pnpm-lock.yaml`), `declared.locator` (stable), `declared.versionSpec` (original range/tag), `declared.scope` (prod/dev/peer/optional if known), and `declared.resolvedVersion` (only when lock provides concrete). **Critical:** do not emit `pkg:npm/...@<range>` PURLs; use `AddFromExplicitKey` when version is not a concrete resolved version. |
|
||||
| 2 | SCAN-NODE-406-002 | TODO | After Action 2, implement + fixtures. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Multi-version lock correctness**: fix `NodeLockData` to support multiple versions per package name and match lock entries by `(name, resolvedVersion)` when the on-disk package.json has a concrete version. Add a `TryGet(relativePath, name, version)` overload (or equivalent) so lock metadata (`integrity`, `resolved`, `scope`) attaches to the correct package instance. Replace/augment `_byName` with a deterministic `(name@version)->entry` map for yarn/pnpm sources. |
|
||||
| 3 | SCAN-NODE-406-003 | TODO | No external YAML libs; keep deterministic. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Support Yarn Berry (v2/v3) lock format**: extend `NodeLockData.LoadYarnLock` to parse modern `yarn.lock` entries that use `resolution:` / `checksum:` / `linkType:` (and may not have `resolved`/`integrity`). Map `checksum` to an integrity-like field (metadata/evidence) and preserve the raw locator key as `lockLocator`. Ensure multiple versions of the same package are preserved (Task 2). Add fixtures covering Yarn v1 and Yarn v3 lock styles. |
|
||||
| 4 | SCAN-NODE-406-004 | TODO | Align with Action 3 on “integrity missing” semantics. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Harden pnpm lock parsing**: extend `LoadPnpmLock` to handle packages that have no `integrity` (workspace/file/link/git) without silently dropping them. Emit declared-only entries with `declared.resolvedVersion` (if known) and `lockIntegrityMissing=true` + reason. Add support for newer pnpm layouts (`snapshots:`) when present, while keeping parsing bounded and deterministic. |
|
||||
| 5 | SCAN-NODE-406-005 | TODO | After task 2, fix path name extraction and tests. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Fix `package-lock.json` nested node_modules naming**: `ExtractNameFromPath` mis-identifies `node_modules/parent/node_modules/child` unless `name` is present. Update extraction to select the last package segment after the last `node_modules` (incl. scoped packages). Add tests that prove nested dependencies are keyed correctly and lock metadata is attached to the right on-disk package. |
|
||||
| 6 | SCAN-NODE-406-006 | TODO | Decide workspace glob support (Action 2). | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Improve workspace discovery**: `NodeWorkspaceIndex` only supports patterns ending with `/*`. Extend it to support at least `**`-style patterns used in monorepos (e.g., `packages/**`, `apps/*`, `tools/*`). Ensure expansion is deterministic and safe (bounds on directory traversal; ignore `node_modules`). Add fixtures for multi-depth workspace patterns. |
|
||||
| 7 | SCAN-NODE-406-007 | TODO | After task 6, add scope index for workspaces. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Workspace-aware dependency scopes**: `NodeDependencyIndex` reads only root `package.json`. Extend scope classification to include workspace member manifests so `scope`/`riskLevel` metadata is correct for workspace packages. Must preserve precedence rules (root vs workspace vs lock) and be deterministic. |
|
||||
| 8 | SCAN-NODE-406-008 | TODO | Requires Action 4 decision on import scanning bounds. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Import scanning correctness + bounds**: `NodeImportWalker` uses `ParseScript` which misses ESM `import` syntax and fails on TS. Improve by attempting `ParseModule` when script parse fails, and add a bounded heuristic fallback for TS (`import ... from`, `export ... from`) when AST parsing fails. Also bound `AttachImports` so it does not recursively scan every file inside `node_modules` trees by default; restrict to source roots/workspace members and/or cap by file count and total bytes, emitting `importScanSkipped=true` + counters when capped. |
|
||||
| 9 | SCAN-NODE-406-009 | TODO | After task 1, adopt consistent evidence hashing. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Deterministic evidence hashing for on-disk `package.json`**: today tar/zip packages attach `PackageSha256`, but on-disk packages typically do not. Compute sha256 for `package.json` contents for installed packages (bounded: only package.json, not full dir) and attach to root evidence consistently. Do not hash large files; do not add unbounded IO. |
|
||||
| 10 | SCAN-NODE-406-010 | TODO | Parallel with tasks 1–9; fixtures first. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Node.Tests`) | **Fixtures + golden outputs**: add/extend fixtures proving: (a) lock-only project (no node_modules) emits declared-only components, (b) Yarn v3 lock parses + multi-version packages preserved, (c) pnpm lock with workspace/link deps doesn’t silently drop, (d) package-lock nested node_modules naming is correct, (e) workspace glob patterns beyond `/*`, (f) container layout where app `package.json` is not at root (e.g., `/app/package.json` inside a layer root) still emits the app component, (g) ESM + TS import scanning captures imports (bounded) and emits deterministic evidence. Update `NodeLanguageAnalyzerTests.cs` and targeted unit tests (`NodeLockDataTests.cs`, `NodePackageCollectorTests.cs`) to assert deterministic ordering and identity rules. |
|
||||
| 11 | SCAN-NODE-406-011 | TODO | After core behavior lands, update docs + perf guard. | Docs Guild + Bench Guild (`docs/modules/scanner`, `src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Document + benchmark Node analyzer contract**: document precedence (installed vs declared), identity rules for unresolved versions, Yarn/pnpm lock parsing guarantees/limits, workspace discovery rules, import scanning bounds/semantics, and container layout assumptions. Add a deterministic offline bench that scans a representative fixture (workspace + lock-only + import scan enabled) and records elapsed time + component counts (and file-scan counters) with a baseline ceiling. |
|
||||
| 1 | SCAN-NODE-406-001 | DONE | Emission + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Emit declared-only components**: `NodeLockData.LoadAsync` already builds `DeclaredPackages` from lockfiles + `package.json`, but `NodeLanguageAnalyzer` never emits them. Add a deterministic "declared-only emission" pass that emits components for any `DeclaredPackages` entry not backed by on-disk inventory. Must include: `declaredOnly=true`, `declared.source` (`package.json|package-lock.json|yarn.lock|pnpm-lock.yaml`), `declared.locator` (stable), `declared.versionSpec` (original range/tag), `declared.scope` (prod/dev/peer/optional if known), and `declared.resolvedVersion` (only when lock provides concrete). **Critical:** do not emit `pkg:npm/...@<range>` PURLs; use `AddFromExplicitKey` when version is not a concrete resolved version. |
|
||||
| 2 | SCAN-NODE-406-002 | DONE | Multi-version matching + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Multi-version lock correctness**: fix `NodeLockData` to support multiple versions per package name and match lock entries by `(name, resolvedVersion)` when the on-disk package.json has a concrete version. Add a `TryGet(relativePath, name, version)` overload (or equivalent) so lock metadata (`integrity`, `resolved`, `scope`) attaches to the correct package instance. Replace/augment `_byName` with a deterministic `(name@version)->entry` map for yarn/pnpm sources. |
|
||||
| 3 | SCAN-NODE-406-003 | DONE | Yarn Berry parsing + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Support Yarn Berry (v2/v3) lock format**: extend `NodeLockData.LoadYarnLock` to parse modern `yarn.lock` entries that use `resolution:` / `checksum:` / `linkType:` (and may not have `resolved`/`integrity`). Map `checksum` to an integrity-like field (metadata/evidence) and preserve the raw locator key as `lockLocator`. Ensure multiple versions of the same package are preserved (Task 2). Add fixtures covering Yarn v1 and Yarn v3 lock styles. |
|
||||
| 4 | SCAN-NODE-406-004 | DONE | pnpm hardening + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Harden pnpm lock parsing**: extend `LoadPnpmLock` to handle packages that have no `integrity` (workspace/file/link/git) without silently dropping them. Emit declared-only entries with `declared.resolvedVersion` (if known) and `lockIntegrityMissing=true` + reason. Add support for newer pnpm layouts (`snapshots:`) when present, while keeping parsing bounded and deterministic. |
|
||||
| 5 | SCAN-NODE-406-005 | DONE | Nested node_modules naming + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Fix `package-lock.json` nested node_modules naming**: `ExtractNameFromPath` mis-identifies `node_modules/parent/node_modules/child` unless `name` is present. Update extraction to select the last package segment after the last `node_modules` (incl. scoped packages). Add tests that prove nested dependencies are keyed correctly and lock metadata is attached to the right on-disk package. |
|
||||
| 6 | SCAN-NODE-406-006 | DONE | Bounded `*`/`**` workspace expansion + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Improve workspace discovery**: `NodeWorkspaceIndex` only supports patterns ending with `/*`. Extend it to support at least `**`-style patterns used in monorepos (e.g., `packages/**`, `apps/*`, `tools/*`). Ensure expansion is deterministic and safe (bounds on directory traversal; ignore `node_modules`). Add fixtures for multi-depth workspace patterns. |
|
||||
| 7 | SCAN-NODE-406-007 | DONE | Workspace-aware scopes + tests landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Workspace-aware dependency scopes**: `NodeDependencyIndex` reads only root `package.json`. Extend scope classification to include workspace member manifests so `scope`/`riskLevel` metadata is correct for workspace packages. Must preserve precedence rules (root vs workspace vs lock) and be deterministic. |
|
||||
| 8 | SCAN-NODE-406-008 | DONE | ESM/TS parsing + bounded import scan landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Import scanning correctness + bounds**: `NodeImportWalker` uses `ParseScript` which misses ESM `import` syntax and fails on TS. Improve by attempting `ParseModule` when script parse fails, and add a bounded heuristic fallback for TS (`import ... from`, `export ... from`) when AST parsing fails. Also bound `AttachImports` so it does not recursively scan every file inside `node_modules` trees by default; restrict to source roots/workspace members and/or cap by file count and total bytes, emitting `importScanSkipped=true` + counters when capped. |
|
||||
| 9 | SCAN-NODE-406-009 | DONE | On-disk `package.json` hashing + fixtures landed. | Node Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node`) | **Deterministic evidence hashing for on-disk `package.json`**: today tar/zip packages attach `PackageSha256`, but on-disk packages typically do not. Compute sha256 for `package.json` contents for installed packages (bounded: only package.json, not full dir) and attach to root evidence consistently. Do not hash large files; do not add unbounded IO. |
|
||||
| 10 | SCAN-NODE-406-010 | DONE | Lock-only lockfile fixtures (package-lock/yarn-berry/pnpm) + workspace glob fixture + container app-root discovery; goldens updated. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Node.Tests`) | **Fixtures + golden outputs**: add/extend fixtures proving: (a) lock-only project (no node_modules) emits declared-only components, (b) Yarn v3 lock parses + multi-version packages preserved, (c) pnpm lock with workspace/link deps doesn’t silently drop, (d) package-lock nested node_modules naming is correct, (e) workspace glob patterns beyond `/*`, (f) container layout where app `package.json` is not at root (e.g., `/app/package.json` inside a layer root) still emits the app component, (g) ESM + TS import scanning captures imports (bounded) and emits deterministic evidence. Update `NodeLanguageAnalyzerTests.cs` and targeted unit tests (`NodeLockDataTests.cs`, `NodePackageCollectorTests.cs`) to assert deterministic ordering and identity rules. |
|
||||
| 11 | SCAN-NODE-406-011 | DONE | Docs + offline bench scenario (`node_detection_gaps_fixture`) landed; Prom/JSON record import-scan counters. | Docs Guild + Bench Guild (`docs/modules/scanner`, `src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Document + benchmark Node analyzer contract**: document precedence (installed vs declared), identity rules for unresolved versions, Yarn/pnpm lock parsing guarantees/limits, workspace discovery rules, import scanning bounds/semantics, and container layout assumptions. Add a deterministic offline bench that scans a representative fixture (workspace + lock-only + import scan enabled) and records elapsed time + component counts (and file-scan counters) with a baseline ceiling. |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|
||||
@@ -64,10 +64,10 @@
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | Decide explicit-key identity scheme for declared-only Node deps (ranges/tags/git/file/workspace) and document it. | Project Mgmt + Scanner Guild | 2025-12-13 | Open | Must not collide with concrete `pkg:npm/...@<version>` PURLs; must be stable across OS paths. |
|
||||
| 2 | Decide workspace glob expansion rules (supported patterns, bounds, excluded dirs like `node_modules`). | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Open | Keep deterministic and safe under untrusted inputs. |
|
||||
| 3 | Decide lock metadata precedence when multiple sources exist and when lock lacks integrity/resolution. | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Open | Must be explicit and test-covered; never depend on file traversal order. |
|
||||
| 4 | Decide import-scanning policy: default enabled/disabled, scope (workspace only vs all packages), and caps to enforce. | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Open | Must prevent runaway scans; skipped scans must be auditable. |
|
||||
| 1 | Decide explicit-key identity scheme for declared-only Node deps (ranges/tags/git/file/workspace) and document it. | Project Mgmt + Scanner Guild | 2025-12-13 | Done | Implemented via `LanguageExplicitKey` in `docs/modules/scanner/language-analyzers-contract.md`; Node specifics in `docs/modules/scanner/analyzers-node.md`. |
|
||||
| 2 | Decide workspace glob expansion rules (supported patterns, bounds, excluded dirs like `node_modules`). | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Done | Supports `*` + `**`, skips `node_modules`, bounded traversal; documented in `docs/modules/scanner/analyzers-node.md`. |
|
||||
| 3 | Decide lock metadata precedence when multiple sources exist and when lock lacks integrity/resolution. | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Done | Precedence: path match > `(name,version)` > name-only; documented in `docs/modules/scanner/analyzers-node.md`. |
|
||||
| 4 | Decide import-scanning policy: default enabled/disabled, scope (workspace only vs all packages), and caps to enforce. | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Done | Scope: root + workspace members only; caps + skip markers; bench exports `node.importScan.*` metrics (see `docs/modules/scanner/analyzers-node.md`). |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision (pending):** Declared-only identity scheme, workspace glob bounds, lock precedence, and import scanning caps (Action Tracker 1–4).
|
||||
@@ -84,4 +84,12 @@
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-12 | Sprint created to close Node analyzer detection gaps (declared-only emission, multi-version lock fidelity, Yarn Berry/pnpm parsing, workspace glob support, import scanning correctness/bounds, deterministic evidence hashing) with fixtures/bench/docs expectations. | Project Mgmt |
|
||||
| 2025-12-13 | Completed Wave A/B tasks 406-001..406-005 (declared-only emission, multi-version lock fidelity, Yarn Berry parsing, pnpm integrity-missing + snapshots, nested package-lock naming) with regression tests. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-006 (bounded `*`/`**` workspace expansion; skips `node_modules`) with unit tests. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-007 (workspace-aware dependency scopes) with fixture update + tests. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-008 (ESM/TS import scanning + bounds) with fixture update + tests. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-009 (on-disk `package.json` sha256 evidence) with fixture updates. | Implementer |
|
||||
| 2025-12-13 | Updated declared-only emission to use the cross-analyzer explicit-key format and expanded fixtures for `layers/`, `.layers/`, and `layer*/` discovery. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-010 (fixtures + goldens: lock-only package-lock/yarn-berry/pnpm, workspace globs, container app-root discovery) with regression tests. | Implementer |
|
||||
| 2025-12-13 | Completed task 406-011 (docs + offline bench: `docs/modules/scanner/analyzers-node.md`, scenario `node_detection_gaps_fixture`, import-scan metrics) with bench/test coverage. | Implementer |
|
||||
|
||||
|
||||
@@ -20,20 +20,20 @@
|
||||
- `docs/modules/scanner/architecture.md`
|
||||
- `src/Scanner/AGENTS.md`
|
||||
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang/AGENTS.md`
|
||||
- **Missing today (must be created before tasks flip to DOING):** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`
|
||||
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` (created 2025-12-13)
|
||||
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | SCAN-BUN-407-001 | TODO | Decide container root discovery contract (Action 2). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Container-layer aware project discovery**: extend `Internal/BunProjectDiscoverer.cs` to discover Bun project roots not only under `context.RootPath`, but also under common OCI unpack layouts used elsewhere in scanner: `layers/*`, `.layers/*`, and `layer*` direct children. Do not skip hidden roots wholesale: `.layers` must be included. Keep traversal bounded and deterministic: (a) stable ordering of enumerated directories, (b) explicit depth caps per root, (c) hard cap on total discovered roots, (d) must never recurse into `node_modules/` and must skip large/non-project dirs deterministically. Acceptance: new fixture `lang/bun/container-layers` proves a Bun project placed under `.layers/layer0/app` is found and scanned. |
|
||||
| 2 | SCAN-BUN-407-002 | TODO | Decide identity rules for non-concrete versions (Action 1). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Declared-only fallback for bun markers**: if `BunProjectDiscoverer` identifies a project root (via `bunfig.toml`/`package.json`/etc.) but `BunInputNormalizer` returns `None` (no `node_modules`, no `bun.lock`), emit declared-only components from `package.json` dependencies. Requirements: (a) do not emit `pkg:npm/...@<range>` PURLs for version ranges/tags; use `AddFromExplicitKey` when version is not a concrete resolved version, (b) include deterministic metadata `declaredOnly=true`, `declared.source=package.json`, `declared.locator=<relative>#<section>`, `declared.versionSpec=<original>`, `declared.scope=<prod|dev|peer|optional>`, and (c) include root package.json evidence with sha256 (bounded). Acceptance: new fixture `lang/bun/bunfig-only` emits declared-only components for both `dependencies` and `devDependencies` with safe identities. |
|
||||
| 3 | SCAN-BUN-407-003 | TODO | Decide dev/optional/peer semantics for bun.lock v1 (Action 3). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **bun.lock v1 graph enrichment (dev/optional/peer + edges)**: upgrade `Internal/BunLockParser.cs` to preserve dependency edges from bun.lock v1 array form (capture dependency value/specifier, not only names) and to parse optional peer information when present. Build a bounded dependency graph that starts from root `package.json` declarations (prod/dev/optional/peer) and propagates reachability to lock entries, marking `BunLockEntry.IsDev/IsOptional/IsPeer` deterministically. If the graph cannot disambiguate (multiple versions/specifier mismatch), do not guess; emit `scopeUnknown=true` and keep `IsDev=false` unless positively proven. Acceptance: add fixture `lang/bun/lockfile-dev-classification` demonstrating: (a) dev-only packages are tagged `dev=true` and are excluded when `includeDev=false`, (b) prod packages remain untagged, (c) the decision is stable across OS/filesystem ordering. |
|
||||
| 4 | SCAN-BUN-407-004 | TODO | After task 3 lands, wire filter & metadata into emission. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Make `includeDev` meaningful**: `Internal/BunLockInventory.cs` currently filters by `entry.IsDev`, but bun.lock array parsing sets `IsDev=false` always. After graph enrichment (Task 3), implement deterministic filtering for lockfile-only scans and ensure installed scans also carry dev/optional/peer metadata when lock data is present. Acceptance: tests show dev filtering affects output only when the analyzer can prove dev reachability; otherwise outputs remain but are marked `scopeUnknown=true`. |
|
||||
| 5 | SCAN-BUN-407-005 | TODO | Decide patch-keying and path normalization (Action 4). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Version-specific patch mapping + no absolute paths**: fix `Internal/BunWorkspaceHelper.cs` so `patchedDependencies` keys preserve version specificity (`name@version`), and patch-directory discovery emits **relative** deterministic paths (relative to project root) rather than absolute OS paths. Update `BunLanguageAnalyzer` patch application so it first matches `name@version`, then falls back to `name` only when unambiguous. Acceptance: add fixture `lang/bun/patched-multi-version` with two patch files for the same package name at different versions; output marks only the correct version as patched and never includes absolute paths. |
|
||||
| 6 | SCAN-BUN-407-006 | TODO | Align locator conventions with Node analyzer (Action 2). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Evidence strengthening + locator precision**: improve `Internal/BunPackage.CreateEvidence()` so evidence locators are stable and specific: (a) package.json evidence includes sha256 (bounded; if skipped, emit `packageJsonHashSkipped=true` with reason), (b) bun.lock evidence uses locator `bun.lock#packages/<name@version>` (or another agreed deterministic locator format) instead of plain `bun.lock`, (c) optionally include lockfile sha256 once per project root in a synthetic “bun.lock evidence record” component or via repeated evidence with identical sha256 (bounded). Acceptance: update existing Bun fixtures’ goldens to reflect deterministic hashing and locator formats, with no nondeterministic absolute paths. |
|
||||
| 7 | SCAN-BUN-407-007 | TODO | Decide identity rules for non-npm sources (Action 1). | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Identity safety for non-npm sources**: `Internal/BunPackage.BuildPurl()` always emits `pkg:npm/<name>@<version>`. Define and implement rules for `SourceType != npm` (git/file/link/workspace/tarball/custom-registry): when `version` is not a concrete registry version, emit `AddFromExplicitKey` (no PURL) and preserve the original specifier/resolved URL in metadata. If a PURL is emitted, it must be valid and must not embed raw specifiers like `workspace:*` as a “version”. Acceptance: add fixture `lang/bun/non-concrete-versions` demonstrating safe identities for `workspace:*` / `link:` / `file:` styles (if representable in bun.lock), with deterministic explicit keys and clear metadata markers. |
|
||||
| 8 | SCAN-BUN-407-008 | TODO | After tasks 1–7, document analyzer contract. | Docs Guild + Bun Analyzer Guild | **Document Bun analyzer detection contract**: add/update `docs/modules/scanner/analyzers-bun.md` (or the closest existing scanner doc) describing: what artifacts are used (node_modules, bun.lock, package.json), precedence rules, identity rules (PURL vs explicit-key), dev/optional/peer semantics, container layer root handling, and bounds (depth/roots/files/hash limits). Link this sprint from the doc and add a brief “known limitations” section (e.g., bun.lockb unsupported). |
|
||||
| 9 | SCAN-BUN-407-009 | TODO | Optional; only if perf regression risk materializes. | Bench Guild (`src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Offline benchmark**: add a deterministic bench that scans a representative Bun monorepo fixture (workspaces + many packages) and records elapsed time + component counts. Establish a ceiling and guard against regressions. |
|
||||
| 1 | SCAN-BUN-407-001 | DONE | Fixture `lang/bun/container-layers` + determinism test passing. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Container-layer aware project discovery**: extend `Internal/BunProjectDiscoverer.cs` to discover Bun project roots not only under `context.RootPath`, but also under common OCI unpack layouts used elsewhere in scanner: `layers/*`, `.layers/*`, and `layer*` direct children. Do not skip hidden roots wholesale: `.layers` must be included. Keep traversal bounded and deterministic: (a) stable ordering of enumerated directories, (b) explicit depth caps per root, (c) hard cap on total discovered roots, (d) must never recurse into `node_modules/` and must skip large/non-project dirs deterministically. Acceptance: new fixture `lang/bun/container-layers` proves a Bun project placed under `.layers/layer0/app` is found and scanned. |
|
||||
| 2 | SCAN-BUN-407-002 | DONE | Fixture `lang/bun/bunfig-only` + determinism test passing. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Declared-only fallback for bun markers**: if `BunProjectDiscoverer` identifies a project root (via `bunfig.toml`/`package.json`/etc.) but `BunInputNormalizer` returns `None` (no `node_modules`, no `bun.lock`), emit declared-only components from `package.json` dependencies. Requirements: (a) do not emit `pkg:npm/...@<range>` PURLs for version ranges/tags; use `AddFromExplicitKey` when version is not a concrete resolved version, (b) include deterministic metadata `declaredOnly=true`, `declared.source=package.json`, `declared.locator=<relative>#<section>`, `declared.versionSpec=<original>`, `declared.scope=<prod|dev|peer|optional>`, and (c) include root package.json evidence with sha256 (bounded). Acceptance: new fixture `lang/bun/bunfig-only` emits declared-only components for both `dependencies` and `devDependencies` with safe identities. |
|
||||
| 3 | SCAN-BUN-407-003 | DONE | Fixture `lang/bun/lockfile-dev-classification` passing. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **bun.lock v1 graph enrichment (dev/optional/peer + edges)**: upgrade `Internal/BunLockParser.cs` to preserve dependency edges from bun.lock v1 array form (capture dependency value/specifier, not only names) and to parse optional peer information when present. Build a bounded dependency graph that starts from root `package.json` declarations (prod/dev/optional/peer) and propagates reachability to lock entries, marking `BunLockEntry.IsDev/IsOptional/IsPeer` deterministically. If the graph cannot disambiguate (multiple versions/specifier mismatch), do not guess; emit `scopeUnknown=true` and keep `IsDev=false` unless positively proven. Acceptance: add fixture `lang/bun/lockfile-dev-classification` demonstrating: (a) dev-only packages are tagged `dev=true` and are excluded when `includeDev=false`, (b) prod packages remain untagged, (c) the decision is stable across OS/filesystem ordering. |
|
||||
| 4 | SCAN-BUN-407-004 | DONE | Dev filter verified via fixture goldens. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Make `includeDev` meaningful**: `Internal/BunLockInventory.cs` currently filters by `entry.IsDev`, but bun.lock array parsing sets `IsDev=false` always. After graph enrichment (Task 3), implement deterministic filtering for lockfile-only scans and ensure installed scans also carry dev/optional/peer metadata when lock data is present. Acceptance: tests show dev filtering affects output only when the analyzer can prove dev reachability; otherwise outputs remain but are marked `scopeUnknown=true`. |
|
||||
| 5 | SCAN-BUN-407-005 | DONE | Fixture `lang/bun/patched-multi-version` passing. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Version-specific patch mapping + no absolute paths**: fix `Internal/BunWorkspaceHelper.cs` so `patchedDependencies` keys preserve version specificity (`name@version`), and patch-directory discovery emits **relative** deterministic paths (relative to project root) rather than absolute OS paths. Update `BunLanguageAnalyzer` patch application so it first matches `name@version`, then falls back to `name` only when unambiguous. Acceptance: add fixture `lang/bun/patched-multi-version` with two patch files for the same package name at different versions; output marks only the correct version as patched and never includes absolute paths. |
|
||||
| 6 | SCAN-BUN-407-006 | DONE | Goldens updated; bounded sha256 + lock locators added. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Evidence strengthening + locator precision**: improve `Internal/BunPackage.CreateEvidence()` so evidence locators are stable and specific: (a) package.json evidence includes sha256 (bounded; if skipped, emit `packageJson.hashSkipped=true` + `packageJson.hashSkipReason=<...>`), (b) bun.lock evidence uses locator `<lockfileRelativePath>:packages[<name>@<version>]` instead of plain `bun.lock`, (c) include lockfile sha256 once per project root via repeated evidence sha256 (bounded). Acceptance: update existing Bun fixtures’ goldens to reflect deterministic hashing and locator formats, with no nondeterministic absolute paths. |
|
||||
| 7 | SCAN-BUN-407-007 | DONE | Fixture `lang/bun/non-concrete-versions` passing. | Bun Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun`) | **Identity safety for non-npm sources**: `Internal/BunPackage.BuildPurl()` always emits `pkg:npm/<name>@<version>`. Define and implement rules for `SourceType != npm` (git/file/link/workspace/tarball/custom-registry): when `version` is not a concrete registry version, emit `AddFromExplicitKey` (no PURL) and preserve the original specifier/resolved URL in metadata. If a PURL is emitted, it must be valid and must not embed raw specifiers like `workspace:*` as a “version”. Acceptance: add fixture `lang/bun/non-concrete-versions` demonstrating safe identities for `workspace:*` / `link:` / `file:` styles (if representable in bun.lock), with deterministic explicit keys and clear metadata markers. |
|
||||
| 8 | SCAN-BUN-407-008 | DONE | Doc `docs/modules/scanner/analyzers-bun.md` published and sprint linked. | Docs Guild + Bun Analyzer Guild | **Document Bun analyzer detection contract**: add/update `docs/modules/scanner/analyzers-bun.md` (or the closest existing scanner doc) describing: what artifacts are used (node_modules, bun.lock, package.json), precedence rules, identity rules (PURL vs explicit-key), dev/optional/peer semantics, container layer root handling, and bounds (depth/roots/files/hash limits). Link this sprint from the doc and add a brief “known limitations” section (e.g., bun.lockb unsupported). |
|
||||
| 9 | SCAN-BUN-407-009 | DONE | Added scenario `bun_multi_workspace_fixture` in analyzer microbench harness. | Bench Guild (`src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Offline benchmark**: add a deterministic bench that scans a representative Bun monorepo fixture (workspaces + many packages) and records elapsed time + component counts. Establish a ceiling and guard against regressions. |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Guild owners | Shared prerequisites | Status | Notes |
|
||||
@@ -67,14 +67,14 @@
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | Decide explicit-key identity scheme for Bun declared-only and non-npm sources (ranges/tags/git/file/link/workspace). | Project Mgmt + Scanner Guild | 2025-12-13 | Open | Must not collide with concrete `pkg:npm/...@<version>` identities; must be stable across OS paths. |
|
||||
| 2 | Decide and document container layer root discovery rules for Bun analyzer (parity with Node’s `layers/.layers/layer*` conventions, depth/roots bounds). | Project Mgmt + Bun Analyzer Guild | 2025-12-13 | Open | Must prevent runaway scans on untrusted rootfs layouts; must be fixture-tested. |
|
||||
| 3 | Decide bun.lock v1 scope derivation rules (dev/optional/peer) and how uncertainty is represented (`scopeUnknown` markers). | Project Mgmt + Bun Analyzer Guild | 2025-12-13 | Open | Must be deterministic; avoid false “dev=false” claims when graph is ambiguous. |
|
||||
| 4 | Decide patched dependency keying and deterministic path normalization (relative path base, name@version precedence, fallback rules). | Project Mgmt + Bun Analyzer Guild + Security Guild | 2025-12-13 | Open | Must avoid absolute path leakage; ensure correct version-specific patch attribution. |
|
||||
| 5 | Create missing module charter: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`. | Project Mgmt | 2025-12-13 | Open | Required before implementation tasks can enter DOING per global charter. |
|
||||
| 1 | Decide explicit-key identity scheme for Bun declared-only and non-npm sources (ranges/tags/git/file/link/workspace). | Project Mgmt + Scanner Guild | 2025-12-13 | Done | Implemented per `docs/modules/scanner/language-analyzers-contract.md`. |
|
||||
| 2 | Decide and document container layer root discovery rules for Bun analyzer (parity with Node's `layers/.layers/layer*` conventions, depth/roots bounds). | Project Mgmt + Bun Analyzer Guild | 2025-12-13 | Done | Implemented per `docs/modules/scanner/language-analyzers-contract.md`; validated by fixture `lang/bun/container-layers`. |
|
||||
| 3 | Decide bun.lock v1 scope derivation rules (dev/optional/peer) and how uncertainty is represented (`scopeUnknown` markers). | Project Mgmt + Bun Analyzer Guild | 2025-12-13 | Done | Implemented in `Internal/BunLockScopeClassifier.cs` with `scopeUnknown=true` for ambiguity. |
|
||||
| 4 | Decide patched dependency keying and deterministic path normalization (relative path base, name@version precedence, fallback rules). | Project Mgmt + Bun Analyzer Guild + Security Guild | 2025-12-13 | Done | Implemented in `Internal/BunWorkspaceHelper.cs` (version-specific keys; project-relative patch paths). |
|
||||
| 5 | Create missing module charter: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`. | Project Mgmt | 2025-12-13 | Done | Created `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`. |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision (pending):** identity scheme, container discovery, scope derivation, patch rules (Action Tracker 1–4).
|
||||
- Decisions implemented per `docs/modules/scanner/language-analyzers-contract.md` and documented in `docs/modules/scanner/analyzers-bun.md`.
|
||||
|
||||
| Risk ID | Risk | Impact | Likelihood | Mitigation | Owner | Trigger / Signal |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
@@ -88,4 +88,7 @@
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-12 | Sprint created to close Bun analyzer detection gaps (container-layer discovery, declared-only fallback, bun.lock scope graph, version-specific patches, evidence hashing, identity safety) with fixtures/docs/bench expectations. | Project Mgmt |
|
||||
| 2025-12-13 | Completed SCAN-BUN-407-001 and SCAN-BUN-407-002 with new fixtures (`lang/bun/container-layers`, `lang/bun/bunfig-only`) and deterministic goldens; aligned explicit-key behavior with `docs/modules/scanner/language-analyzers-contract.md`. | Bun Analyzer Guild |
|
||||
| 2025-12-13 | Completed SCAN-BUN-407-003 through SCAN-BUN-407-008 (scope graph + dev filtering, version-specific patch mapping, bounded sha256 evidence, non-concrete identity safety, and Bun analyzer contract doc). | Bun Analyzer Guild |
|
||||
| 2025-12-13 | Completed SCAN-BUN-407-009 by wiring the Bun analyzer into the scanner analyzer microbench harness and adding scenario `bun_multi_workspace_fixture`. | Bench Guild |
|
||||
|
||||
|
||||
@@ -27,15 +27,15 @@
|
||||
- .NET: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.DotNet/AGENTS.md`
|
||||
- Python: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/AGENTS.md`
|
||||
- Node: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Node/AGENTS.md`
|
||||
- **Missing today:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` (Action 4)
|
||||
- Bun: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` (created 2025-12-13; Action 4)
|
||||
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | SCAN-PROG-408-001 | TODO | Requires Action 1. | Scanner Guild + Security Guild + Export/UI/CLI Consumers | **Freeze cross-analyzer identity safety contract**: define a single, documented rule-set for when an analyzer emits (a) a concrete PURL and (b) an explicit-key component. Must cover: version ranges/tags, local paths, workspace/link/file deps, git deps, and “unknown” versions. Output: a canonical doc under `docs/modules/scanner/` (path chosen in Action 1) + per-analyzer unit tests asserting “no invalid PURLs” for declared-only / non-concrete inputs. |
|
||||
| 2 | SCAN-PROG-408-002 | TODO | Requires Action 2. | Scanner Guild + Export/UI/CLI Consumers | **Freeze cross-analyzer evidence locator contract**: define deterministic locator formats for (a) lockfile entries, (b) nested artifacts (e.g., Java “outer!inner!path”), and (c) derived evidence records. Output: canonical doc + at least one golden fixture per analyzer asserting exact locator strings and bounded evidence sizes. |
|
||||
| 3 | SCAN-PROG-408-003 | TODO | Requires Action 3. | Scanner Guild | **Freeze container layout discovery contract**: define which analyzers must discover projects under `layers/`, `.layers/`, and `layer*/` layouts, how ordering/whiteouts are handled (where applicable), and bounds (depth/roots/files). Output: canonical doc + fixtures proving parity for Node/Bun/Python (and any Java/.NET container behaviors where relevant). |
|
||||
| 4 | SCAN-PROG-408-004 | TODO | None. | Project Mgmt + Scanner Guild | **Create missing Bun analyzer charter**: add `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` synthesizing constraints from `docs/modules/scanner/architecture.md` and this sprint + `SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md`. Must include: allowed directories, test strategy, determinism rules, identity/evidence conventions, and “no absolute paths” requirement. |
|
||||
| 1 | SCAN-PROG-408-001 | DOING | Requires Action 1. | Scanner Guild + Security Guild + Export/UI/CLI Consumers | **Freeze cross-analyzer identity safety contract**: define a single, documented rule-set for when an analyzer emits (a) a concrete PURL and (b) an explicit-key component. Must cover: version ranges/tags, local paths, workspace/link/file deps, git deps, and "unknown" versions. Output: a canonical doc under `docs/modules/scanner/` (path chosen in Action 1) + per-analyzer unit tests asserting "no invalid PURLs" for declared-only / non-concrete inputs. |
|
||||
| 2 | SCAN-PROG-408-002 | DOING | Requires Action 2. | Scanner Guild + Export/UI/CLI Consumers | **Freeze cross-analyzer evidence locator contract**: define deterministic locator formats for (a) lockfile entries, (b) nested artifacts (e.g., Java "outer!inner!path"), and (c) derived evidence records. Output: canonical doc + at least one golden fixture per analyzer asserting exact locator strings and bounded evidence sizes. |
|
||||
| 3 | SCAN-PROG-408-003 | DOING | Requires Action 3. | Scanner Guild | **Freeze container layout discovery contract**: define which analyzers must discover projects under `layers/`, `.layers/`, and `layer*/` layouts, how ordering/whiteouts are handled (where applicable), and bounds (depth/roots/files). Output: canonical doc + fixtures proving parity for Node/Bun/Python (and any Java/.NET container behaviors where relevant). |
|
||||
| 4 | SCAN-PROG-408-004 | DONE | Unblocks Bun sprint DOING. | Project Mgmt + Scanner Guild | **Create missing Bun analyzer charter**: add `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` synthesizing constraints from `docs/modules/scanner/architecture.md` and this sprint + `SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md`. Must include: allowed directories, test strategy, determinism rules, identity/evidence conventions, and "no absolute paths" requirement. |
|
||||
| 5 | SCAN-PROG-408-JAVA | TODO | Actions 1–2 recommended before emission format changes. | Java Analyzer Guild + QA Guild | **Implement all Java gaps** per `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md`: (a) embedded libs inside fat archives without extraction, (b) `pom.xml` fallback when properties missing, (c) multi-module Gradle lock discovery + deterministic precedence, (d) runtime image component emission from `release`, (e) replace JNI string scanning with bytecode-based JNI analysis. Acceptance: Java analyzer tests + new fixtures/goldens; bounded scanning with explicit skipped markers. |
|
||||
| 6 | SCAN-PROG-408-DOTNET | TODO | Actions 1–2 recommended before adding declared-only identities. | .NET Analyzer Guild + QA Guild | **Implement all .NET gaps** per `docs/implplan/SPRINT_0404_0001_0001_scanner_dotnet_detection_gaps.md`: (a) declared-only fallback when no deps.json, (b) non-colliding identity for unresolved versions, (c) deterministic merge of declared vs installed packages, (d) bounded bundling signals, (e) optional declared edges provenance, (f) fixtures/docs (and optional bench). Acceptance: `.NET` analyzer emits components for source trees with lock/build files; no restore/MSBuild execution; deterministic outputs. |
|
||||
| 7 | SCAN-PROG-408-PYTHON | TODO | Actions 1–3 recommended before overlay/identity changes. | Python Analyzer Guild + QA Guild | **Implement all Python gaps** per `docs/implplan/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md`: (a) layout-aware discovery (avoid “any dist-info anywhere”), (b) expanded lock/requirements parsing (includes/editables/PEP508/direct refs), (c) correct container overlay/whiteout semantics (or explicit overlayIncomplete markers), (d) vendored dependency surfacing with safe identity rules, (e) optional used-by signals (bounded/opt-in), (f) fixtures/docs/bench. Acceptance: deterministic fixtures for lock formats and container overlays; no invalid “editable-as-version” PURLs per Action 1. |
|
||||
@@ -72,10 +72,10 @@
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 1 | Choose canonical doc path + define explicit-key identity recipe across analyzers. | Project Mgmt + Scanner Guild + Security Guild | 2025-12-13 | Open | Must prevent collisions with concrete PURLs; must be OS-path stable and deterministic. |
|
||||
| 2 | Define evidence locator formats (lock entries, nested artifacts, derived evidence) and required hashing rules/bounds. | Project Mgmt + Scanner Guild + Export/UI/CLI Consumers | 2025-12-13 | Open | Must be parseable and stable; add golden fixtures asserting exact strings. |
|
||||
| 3 | Define container layer/rootfs discovery + overlay semantics contract and bounds. | Project Mgmt + Scanner Guild | 2025-12-13 | Open | Align Node/Bun/Python; clarify when overlayIncomplete markers are required. |
|
||||
| 4 | Create `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` and link it from Bun sprint prerequisites. | Project Mgmt | 2025-12-13 | Open | Required before Bun implementation tasks can flip to DOING. |
|
||||
| 1 | Choose canonical doc path + define explicit-key identity recipe across analyzers. | Project Mgmt + Scanner Guild + Security Guild | 2025-12-13 | In Progress | Doc: `docs/modules/scanner/language-analyzers-contract.md`; Node/Bun/Python updated to emit explicit-key for non-concrete identities with tests/fixtures. |
|
||||
| 2 | Define evidence locator formats (lock entries, nested artifacts, derived evidence) and required hashing rules/bounds. | Project Mgmt + Scanner Guild + Export/UI/CLI Consumers | 2025-12-13 | In Progress | Doc: `docs/modules/scanner/language-analyzers-contract.md`; Node/Bun/Python fixtures assert locator formats (lock entries, nested artifacts, derived evidence). |
|
||||
| 3 | Define container layer/rootfs discovery + overlay semantics contract and bounds. | Project Mgmt + Scanner Guild | 2025-12-13 | In Progress | Doc: `docs/modules/scanner/language-analyzers-contract.md`; fixtures now cover Node/Bun/Python parity for `layers/`, `.layers/`, and `layer*/`. |
|
||||
| 4 | Create `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md` and link it from Bun sprint prerequisites. | Project Mgmt | 2025-12-13 | Done | Created `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`; updated Bun sprint prerequisites. |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision (pending):** cross-analyzer identity/evidence/container contracts (Actions 1–3).
|
||||
@@ -92,4 +92,7 @@
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2025-12-12 | Program sprint created to coordinate implementation of all language analyzer detection gaps (Java/.NET/Python/Node/Bun) with shared contracts and acceptance evidence. | Project Mgmt |
|
||||
| 2025-12-13 | Created Bun analyzer charter (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Bun/AGENTS.md`); updated Bun sprint prerequisites; marked SCAN-PROG-408-004 complete. | Project Mgmt |
|
||||
| 2025-12-13 | Set SCAN-PROG-408-001..003 to DOING; started Actions 1-3 (identity/evidence/container contracts). | Scanner Guild |
|
||||
| 2025-12-13 | Implemented Node/Python contract compliance (explicit-key for declared-only, tarball/git/file/workspace classification; Python editable lock entries now explicit-key with host-path scrubbing) and extended fixtures for `.layers`/`layers`/`layer*`; Node + Python test suites passing. | Implementer |
|
||||
|
||||
|
||||
@@ -28,6 +28,7 @@
|
||||
| 6 | SCAN-NL-0409-006 | DONE | — | Scanner · Backend | RPM sqlite read path: avoid `SELECT *` and column-scanning where feasible (schema probe + targeted column selection). Add unit coverage for schema variants. |
|
||||
| 7 | SCAN-NL-0409-007 | DONE | — | Scanner · Backend/QA | Native “unknowns” quality: emit unknowns even when dependency list is empty; extract ELF `.dynsym` undefined symbols for unknown edges; add regression test. |
|
||||
| 8 | SCAN-NL-0409-008 | DONE | — | Scanner · Docs | Document OS analyzer evidence semantics (paths/digests/warnings) and caching behavior under `docs/modules/scanner/` (and link from sprint Decisions & Risks). |
|
||||
| 9 | SCAN-NL-0409-009 | DOING | Update Ruby analyzer determinism fixtures | Scanner · Backend/QA | Keep `src/Scanner/StellaOps.Scanner.sln` green: fix Ruby capability detection regression (`Open3.capture3`) and refresh Ruby golden fixtures (legacy/container/complex). |
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
@@ -39,6 +40,7 @@
|
||||
| 2025-12-12 | Optimized rpmdb sqlite reader (schema probe + targeted selection/query); added tests. | Scanner |
|
||||
| 2025-12-12 | Improved native “unknowns” (ELF `.dynsym` undefined symbols) and added regression test. | Scanner |
|
||||
| 2025-12-12 | Documented OS/non-language evidence contract and caching behavior. | Scanner |
|
||||
| 2025-12-13 | Follow-up QA: started SCAN-NL-0409-009 to keep Scanner solution tests green (Ruby analyzer determinism + capability regression). | Scanner |
|
||||
|
||||
## Decisions & Risks
|
||||
- **OS cache safety:** Only cache when the rootfs fingerprint is representative of analyzer inputs; otherwise bypass cache to avoid stale results.
|
||||
@@ -47,4 +49,5 @@
|
||||
- **Evidence contract:** `docs/modules/scanner/os-analyzers-evidence.md`.
|
||||
|
||||
## Next Checkpoints
|
||||
- 2025-12-12: Sprint completed; all tasks set to DONE.
|
||||
- 2025-12-12: Sprint completed; all OS/non-language tasks set to DONE.
|
||||
- 2025-12-13: Follow-up QA (SCAN-NL-0409-009) in progress.
|
||||
|
||||
@@ -0,0 +1,159 @@
|
||||
# Sprint 0410.0001.0001 - Entrypoint Detection Re-Engineering Program
|
||||
|
||||
## Topic & Scope
|
||||
- Window: 2025-12-16 -> 2026-02-28 (UTC); phased delivery across 5 child sprints.
|
||||
- **Vision:** Re-engineer entrypoint detection to be industry-leading with semantic understanding, temporal tracking, multi-container mesh analysis, speculative execution, binary intelligence, and predictive risk scoring.
|
||||
- **Strategic Goal:** Position StellaOps entrypoint detection as the foundation for context-aware vulnerability assessment - answering not just "what's installed" but "what's running, how it's invoked, and what can reach it."
|
||||
- **Working directory:** `docs/implplan` (coordination); implementation in `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/` and related modules.
|
||||
|
||||
## Program Architecture
|
||||
|
||||
### Current State
|
||||
The existing entrypoint detection has:
|
||||
- Container-level OCI config parsing (ENTRYPOINT/CMD)
|
||||
- ShellFlow static analyzer for shell scripts
|
||||
- Per-language analyzers (Python, Java, Node, .NET, Go, Ruby, Rust, Bun, Deno, PHP)
|
||||
- Evidence chains with `usedByEntrypoint` flags
|
||||
- Dual-mode (static image + running container)
|
||||
|
||||
### Target State: Entrypoint Knowledge Graph
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────┐
|
||||
│ ENTRYPOINT KNOWLEDGE GRAPH │
|
||||
├────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Semantic │────▶│ Temporal │────▶│ Mesh │ │
|
||||
│ │ Engine │ │ Graph │ │ Analysis │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Speculative │────▶│ Binary │────▶│ Predictive │ │
|
||||
│ │ Execution │ │ Intelligence │ │ Risk │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ Query: "Which images have Django entrypoints reachable to │
|
||||
│ log4j 2.14.1?" │
|
||||
│ Answer: 847 images, 12 in production, 3 internet-facing │
|
||||
│ │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Child Sprints
|
||||
|
||||
| Sprint ID | Name | Focus | Window | Status |
|
||||
|-----------|------|-------|--------|--------|
|
||||
| 0411.0001.0001 | Semantic Entrypoint Engine | Semantic understanding, intent/capability inference | 2025-12-16 -> 2025-12-30 | TODO |
|
||||
| 0412.0001.0001 | Temporal & Mesh Entrypoint | Temporal tracking, multi-container mesh | 2026-01-02 -> 2026-01-17 | TODO |
|
||||
| 0413.0001.0001 | Speculative Execution Engine | Symbolic execution, path enumeration | 2026-01-20 -> 2026-02-03 | TODO |
|
||||
| 0414.0001.0001 | Binary Intelligence | Fingerprinting, symbol recovery | 2026-02-06 -> 2026-02-17 | TODO |
|
||||
| 0415.0001.0001 | Predictive Risk Scoring | Risk-aware scoring, business context | 2026-02-20 -> 2026-02-28 | TODO |
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Upstream: Sprint 0401 Reachability Evidence Chain (completed tasks for richgraph-v1, symbol_id, code_id).
|
||||
- Upstream: Sprint 0408 Scanner Language Detection Gaps Program (mature language analyzers).
|
||||
- Child sprints 0411-0413 can proceed in parallel after semantic foundation lands.
|
||||
- Sprints 0414-0415 depend on earlier sprints for data structures but can overlap.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- docs/modules/scanner/architecture.md
|
||||
- docs/modules/scanner/operations/entrypoint-problem.md
|
||||
- docs/modules/scanner/operations/entrypoint-static-analysis.md
|
||||
- docs/modules/scanner/operations/entrypoint-shell-analysis.md
|
||||
- docs/modules/scanner/operations/entrypoint-runtime-overview.md
|
||||
- docs/reachability/function-level-evidence.md
|
||||
- docs/reachability/lattice.md
|
||||
- src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/AGENTS.md (to be created)
|
||||
|
||||
## Key Deliverables
|
||||
|
||||
### Phase 1: Semantic Foundation (Sprint 0411)
|
||||
1. **SemanticEntrypoint** record with intent, capabilities, attack surface
|
||||
2. **ApplicationIntent** enumeration (web-server, cli-tool, batch-job, worker, serverless, etc.)
|
||||
3. **CapabilityClass** enumeration (network-listen, file-write, exec-spawn, crypto, etc.)
|
||||
4. **ThreatVector** inference from entrypoint characteristics
|
||||
5. Cross-language semantic detection adapters
|
||||
|
||||
### Phase 2: Temporal & Mesh (Sprint 0412)
|
||||
1. **TemporalEntrypointGraph** for version-to-version tracking
|
||||
2. **EntrypointDrift** detection and alerting
|
||||
3. **MeshEntrypointGraph** for multi-container orchestration
|
||||
4. **CrossContainerPath** reachability across services
|
||||
5. Kubernetes/Compose manifest parsing
|
||||
|
||||
### Phase 3: Speculative Execution (Sprint 0413)
|
||||
1. **SymbolicExecutionEngine** for ShellFlow enhancement
|
||||
2. **PathEnumerator** for all terminal states
|
||||
3. **ConstraintSolver** for complex conditionals
|
||||
4. **BranchCoverage** metrics and confidence
|
||||
|
||||
### Phase 4: Binary Intelligence (Sprint 0414)
|
||||
1. **CodeFingerprint** index from OSS package corpus
|
||||
2. **SymbolRecovery** for stripped binaries
|
||||
3. **SourceCorrelation** service
|
||||
4. **FunctionSignatureInference** from binary analysis
|
||||
|
||||
### Phase 5: Predictive Risk (Sprint 0415)
|
||||
1. **RiskFactorExtractor** pipeline
|
||||
2. **EntrypointRiskScorer** with business context
|
||||
3. **AttackSurfaceQuantifier** per entrypoint
|
||||
4. **EntrypointAsCode** auto-generated specifications
|
||||
|
||||
## Competitive Differentiation
|
||||
|
||||
| Capability | StellaOps (Target) | Competition |
|
||||
|------------|-------------------|-------------|
|
||||
| Semantic understanding | Full intent + capability inference | Pattern matching only |
|
||||
| Temporal tracking | Version-to-version evolution | Snapshot only |
|
||||
| Multi-container | Full mesh with cross-container reachability | Single container |
|
||||
| Stripped binaries | Fingerprint + ML recovery | Limited/none |
|
||||
| Speculative execution | All paths enumerated symbolically | Best-effort heuristics |
|
||||
| Entrypoint-as-Code | Auto-generated, executable specs | Manual documentation |
|
||||
| Predictive risk | Business-context-aware scoring | Static CVSS only |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Child Sprints | Shared Prerequisites | Status | Notes |
|
||||
|------|---------------|----------------------|--------|-------|
|
||||
| Foundation | 0411 | Sprint 0401 richgraph/symbol contracts | TODO | Must land before other phases |
|
||||
| Parallel | 0412, 0413 | 0411 semantic records | TODO | Can run concurrently |
|
||||
| Intelligence | 0414 | 0411-0413 data structures | TODO | Binary focus |
|
||||
| Risk | 0415 | 0411-0414 evidence chains | TODO | Final phase |
|
||||
|
||||
## Interlocks
|
||||
- Semantic record schema (Sprint 0411) must stabilize before Temporal/Mesh (0412) or Speculative (0413) start.
|
||||
- Binary fingerprint corpus (Sprint 0414) requires OSS package index integration.
|
||||
- Risk scoring (Sprint 0415) needs Policy Engine integration for gate enforcement.
|
||||
- All phases emit to richgraph-v1 with BLAKE3 hashing per CONTRACT-RICHGRAPH-V1-015.
|
||||
|
||||
## Upcoming Checkpoints
|
||||
- 2025-12-16 - Sprint 0411 kickoff; semantic schema draft review.
|
||||
- 2025-12-23 - Sprint 0411 midpoint; ApplicationIntent/CapabilityClass enums frozen.
|
||||
- 2025-12-30 - Sprint 0411 close; semantic foundation ready for 0412/0413.
|
||||
- 2026-01-02 - Sprints 0412/0413 kickoff (parallel).
|
||||
- 2026-02-28 - Program close; all phases delivered.
|
||||
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
|---|--------|-------|-----------|--------|-------|
|
||||
| 1 | Create AGENTS.md for EntryTrace module | Scanner Guild | 2025-12-16 | TODO | Foundation for implementers |
|
||||
| 2 | Draft SemanticEntrypoint schema | Scanner Guild | 2025-12-18 | TODO | Phase 1 core deliverable |
|
||||
| 3 | Define ApplicationIntent enumeration | Scanner Guild | 2025-12-20 | TODO | Needs cross-language input |
|
||||
| 4 | Create temporal graph storage design | Platform Guild | 2026-01-02 | TODO | Phase 2 dependency |
|
||||
| 5 | Evaluate binary fingerprint corpus options | Scanner Guild | 2026-02-01 | TODO | Phase 4 dependency |
|
||||
|
||||
## Decisions & Risks
|
||||
|
||||
| ID | Risk | Impact | Mitigation / Owner |
|
||||
|----|------|--------|-------------------|
|
||||
| R1 | Semantic schema changes mid-program | Rework in dependent phases | Freeze schema by Sprint 0411 close; Scanner Guild |
|
||||
| R2 | Binary fingerprint corpus size/latency | Slow startup, large storage | Use lazy loading, tiered caching; Platform Guild |
|
||||
| R3 | Multi-container mesh complexity | Detection gaps in complex K8s | Phased support; start with common patterns; Scanner Guild |
|
||||
| R4 | Speculative execution path explosion | Performance issues | Add depth limits, caching; Scanner Guild |
|
||||
| R5 | Risk scoring model accuracy | False confidence signals | Train on CVE exploitation data; validate with red team; Signals Guild |
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
|------------|--------|-------|
|
||||
| 2025-12-13 | Created program sprint from strategic analysis; outlined 5 child sprints with phased delivery; defined competitive differentiation matrix. | Planning |
|
||||
@@ -0,0 +1,163 @@
|
||||
# Sprint 0411.0001.0001 - Semantic Entrypoint Engine
|
||||
|
||||
## Topic & Scope
|
||||
- Window: 2025-12-16 -> 2025-12-30 (UTC); foundation phase for entrypoint re-engineering.
|
||||
- Build semantic understanding layer that infers intent, capabilities, and attack surface from entrypoints.
|
||||
- Enable downstream phases (temporal, mesh, speculative, binary, risk) with stable data structures.
|
||||
- **Working directory:** `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Semantic/`
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Upstream: Sprint 0401 Reachability Evidence Chain (richgraph-v1, symbol_id, code_id contracts).
|
||||
- Upstream: Sprint 0408 Language Detection Gaps (mature Python/Java/Node analyzers).
|
||||
- Blocks: Sprints 0412-0415 depend on semantic records from this sprint.
|
||||
- Language-specific adapters can be developed in parallel once core schema lands.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- docs/modules/scanner/operations/entrypoint-problem.md
|
||||
- docs/modules/scanner/operations/entrypoint-static-analysis.md
|
||||
- docs/modules/scanner/operations/entrypoint-lang-*.md (per-language guides)
|
||||
- docs/reachability/function-level-evidence.md
|
||||
- src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/AGENTS.md
|
||||
|
||||
## Delivery Tracker
|
||||
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|
||||
|---|---------|--------|---------------------------|--------|-----------------|
|
||||
| 1 | ENTRY-SEM-411-001 | TODO | None; foundation task | Scanner Guild | Create `SemanticEntrypoint` record with Id, Specification, Intent, Capabilities, AttackSurface, DataBoundaries, Confidence fields. |
|
||||
| 2 | ENTRY-SEM-411-002 | TODO | Task 1 | Scanner Guild | Define `ApplicationIntent` enumeration: WebServer, CliTool, BatchJob, Worker, Serverless, Daemon, InitSystem, Supervisor, DatabaseServer, MessageBroker, CacheServer, ProxyGateway, Unknown. |
|
||||
| 3 | ENTRY-SEM-411-003 | TODO | Task 1 | Scanner Guild | Define `CapabilityClass` enumeration: NetworkListen, NetworkConnect, FileRead, FileWrite, ProcessSpawn, CryptoOperation, DatabaseAccess, MessageQueue, CacheAccess, ExternalApi, UserInput, ConfigLoad, SecretAccess, LogEmit. |
|
||||
| 4 | ENTRY-SEM-411-004 | TODO | Task 1 | Scanner Guild | Define `ThreatVector` record with VectorType (Ssrf, Sqli, Xss, Rce, PathTraversal, Deserialization, TemplateInjection, AuthBypass, InfoDisclosure, Dos), Confidence, Evidence, EntryPath. |
|
||||
| 5 | ENTRY-SEM-411-005 | TODO | Task 1 | Scanner Guild | Define `DataFlowBoundary` record with BoundaryType (HttpRequest, HttpResponse, FileInput, FileOutput, DatabaseQuery, MessageReceive, MessageSend, EnvironmentVar, CommandLineArg), Direction, Sensitivity. |
|
||||
| 6 | ENTRY-SEM-411-006 | TODO | Task 1 | Scanner Guild | Define `SemanticConfidence` record with Score (0.0-1.0), Tier (Definitive, High, Medium, Low, Unknown), ReasoningChain (list of evidence strings). |
|
||||
| 7 | ENTRY-SEM-411-007 | TODO | Tasks 1-6 | Scanner Guild | Create `ISemanticEntrypointAnalyzer` interface with `AnalyzeAsync(EntryTraceResult, LanguageAnalyzerResult, CancellationToken) -> SemanticEntrypoint`. |
|
||||
| 8 | ENTRY-SEM-411-008 | TODO | Task 7 | Scanner Guild | Implement `PythonSemanticAdapter` inferring intent from: Django (WebServer), Celery (Worker), Click/Typer (CliTool), Lambda (Serverless), Flask/FastAPI (WebServer). |
|
||||
| 9 | ENTRY-SEM-411-009 | TODO | Task 7 | Scanner Guild | Implement `JavaSemanticAdapter` inferring intent from: Spring Boot (WebServer), Quarkus (WebServer), Micronaut (WebServer), Kafka Streams (Worker), Main-Class patterns. |
|
||||
| 10 | ENTRY-SEM-411-010 | TODO | Task 7 | Scanner Guild | Implement `NodeSemanticAdapter` inferring intent from: Express/Koa/Fastify (WebServer), CLI bin entries (CliTool), worker threads, Lambda handlers (Serverless). |
|
||||
| 11 | ENTRY-SEM-411-011 | TODO | Task 7 | Scanner Guild | Implement `DotNetSemanticAdapter` inferring intent from: ASP.NET Core (WebServer), Console apps (CliTool), Worker services (Worker), Azure Functions (Serverless). |
|
||||
| 12 | ENTRY-SEM-411-012 | TODO | Task 7 | Scanner Guild | Implement `GoSemanticAdapter` inferring intent from: net/http patterns (WebServer), cobra/urfave CLI (CliTool), gRPC servers, main package analysis. |
|
||||
| 13 | ENTRY-SEM-411-013 | TODO | Tasks 8-12 | Scanner Guild | Create `CapabilityDetector` that analyzes imports/dependencies to infer capabilities (e.g., `import socket` -> NetworkConnect, `import os.path` -> FileRead). |
|
||||
| 14 | ENTRY-SEM-411-014 | TODO | Task 13 | Scanner Guild | Create `ThreatVectorInferrer` that maps capabilities and framework patterns to likely attack vectors (e.g., WebServer + DatabaseAccess + UserInput -> Sqli risk). |
|
||||
| 15 | ENTRY-SEM-411-015 | TODO | Task 13 | Scanner Guild | Create `DataBoundaryMapper` that traces data flow edges from entrypoint through framework handlers to I/O boundaries. |
|
||||
| 16 | ENTRY-SEM-411-016 | TODO | Tasks 7-15 | Scanner Guild | Create `SemanticEntrypointOrchestrator` that composes adapters, detectors, and inferrers into unified semantic analysis pipeline. |
|
||||
| 17 | ENTRY-SEM-411-017 | TODO | Task 16 | Scanner Guild | Integrate semantic analysis into `EntryTraceAnalyzer` post-processing, emit `SemanticEntrypoint` alongside `EntryTraceResult`. |
|
||||
| 18 | ENTRY-SEM-411-018 | TODO | Task 17 | Scanner Guild | Add semantic fields to `LanguageComponentRecord`: `intent`, `capabilities[]`, `threatVectors[]`. |
|
||||
| 19 | ENTRY-SEM-411-019 | TODO | Task 18 | Scanner Guild | Update richgraph-v1 schema to include semantic metadata on entrypoint nodes. |
|
||||
| 20 | ENTRY-SEM-411-020 | TODO | Task 19 | Scanner Guild | Add CycloneDX and SPDX property extensions for semantic entrypoint data. |
|
||||
| 21 | ENTRY-SEM-411-021 | TODO | Tasks 8-12 | QA Guild | Create test fixtures for each language semantic adapter with expected intent/capabilities. |
|
||||
| 22 | ENTRY-SEM-411-022 | TODO | Task 21 | QA Guild | Add golden test suite validating semantic analysis determinism. |
|
||||
| 23 | ENTRY-SEM-411-023 | TODO | Task 22 | Docs Guild | Document semantic entrypoint schema in `docs/modules/scanner/operations/entrypoint-semantic.md`. |
|
||||
| 24 | ENTRY-SEM-411-024 | TODO | Task 23 | Docs Guild | Update `docs/modules/scanner/architecture.md` with semantic analysis pipeline. |
|
||||
| 25 | ENTRY-SEM-411-025 | TODO | Task 24 | CLI Guild | Add `stella scan --semantic` flag and semantic output fields to JSON/table formats. |
|
||||
|
||||
## Wave Coordination
|
||||
| Wave | Tasks | Shared Prerequisites | Status | Notes |
|
||||
|------|-------|---------------------|--------|-------|
|
||||
| Schema Definition | 1-6 | None | TODO | Core data structures |
|
||||
| Adapter Interface | 7 | Schema frozen | TODO | Contract for language adapters |
|
||||
| Language Adapters | 8-12 | Interface defined | TODO | Can run in parallel |
|
||||
| Cross-Cutting Analysis | 13-15 | Adapters started | TODO | Capability/threat/boundary detection |
|
||||
| Integration | 16-20 | Adapters + analysis | TODO | Wire into scanner pipeline |
|
||||
| QA & Docs | 21-25 | Integration complete | TODO | Validation and documentation |
|
||||
|
||||
## Interlocks
|
||||
- Schema tasks (1-6) must complete before interface task (7).
|
||||
- Interface task (7) gates all language adapters (8-12).
|
||||
- Language adapters can proceed in parallel.
|
||||
- Cross-cutting analysis (13-15) can start once any adapter is in progress.
|
||||
- Integration tasks (16-20) require most adapters complete.
|
||||
- QA/Docs (21-25) can overlap with late integration.
|
||||
|
||||
## Upcoming Checkpoints
|
||||
- 2025-12-18 - Schema freeze (tasks 1-6 complete); interface draft (task 7).
|
||||
- 2025-12-23 - Language adapters midpoint (tasks 8-12 in progress); cross-cutting analysis started.
|
||||
- 2025-12-27 - Integration tasks started (tasks 16-20).
|
||||
- 2025-12-30 - Sprint close; semantic foundation ready.
|
||||
|
||||
## Action Tracker
|
||||
| # | Action | Owner | Due (UTC) | Status | Notes |
|
||||
|---|--------|-------|-----------|--------|-------|
|
||||
| 1 | Review existing entrypoint detection code | Scanner Guild | 2025-12-16 | TODO | Understand integration points |
|
||||
| 2 | Draft ApplicationIntent enum with cross-team input | Scanner Guild | 2025-12-17 | TODO | Need input from all language teams |
|
||||
| 3 | Create AGENTS.md for EntryTrace module | Scanner Guild | 2025-12-16 | TODO | Implementer guidance |
|
||||
| 4 | Validate semantic schema against richgraph-v1 | Platform Guild | 2025-12-18 | TODO | Ensure compatibility |
|
||||
|
||||
## Decisions & Risks
|
||||
|
||||
| ID | Risk | Impact | Mitigation / Owner |
|
||||
|----|------|--------|-------------------|
|
||||
| R1 | Intent enumeration incomplete | Missing application types | Start with common patterns; extend as needed; Scanner Guild |
|
||||
| R2 | Capability detection false positives | Noise in attack surface | Use confidence scoring; require multiple signals; Scanner Guild |
|
||||
| R3 | Schema changes after freeze | Rework in dependent sprints | Strict freeze enforcement after 2025-12-18; Planning |
|
||||
| R4 | Language adapter coverage gaps | Inconsistent semantic depth | Prioritize Python/Java/Node; others can be stubs; Scanner Guild |
|
||||
|
||||
## Schema Preview
|
||||
|
||||
### SemanticEntrypoint Record
|
||||
```csharp
|
||||
public sealed record SemanticEntrypoint
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required EntrypointSpecification Specification { get; init; }
|
||||
public required ApplicationIntent Intent { get; init; }
|
||||
public required ImmutableArray<CapabilityClass> Capabilities { get; init; }
|
||||
public required ImmutableArray<ThreatVector> AttackSurface { get; init; }
|
||||
public required ImmutableArray<DataFlowBoundary> DataBoundaries { get; init; }
|
||||
public required SemanticConfidence Confidence { get; init; }
|
||||
public ImmutableDictionary<string, string>? Metadata { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### ApplicationIntent Enumeration
|
||||
```csharp
|
||||
public enum ApplicationIntent
|
||||
{
|
||||
Unknown = 0,
|
||||
WebServer = 1, // HTTP/HTTPS listener (Django, Express, ASP.NET)
|
||||
CliTool = 2, // Command-line utility (Click, Cobra)
|
||||
BatchJob = 3, // One-shot data processing
|
||||
Worker = 4, // Background job processor (Celery, Sidekiq)
|
||||
Serverless = 5, // FaaS handler (Lambda, Azure Functions)
|
||||
Daemon = 6, // Long-running background service
|
||||
InitSystem = 7, // Process manager (systemd, s6)
|
||||
Supervisor = 8, // Child process supervisor
|
||||
DatabaseServer = 9, // Database engine
|
||||
MessageBroker = 10, // Message queue server
|
||||
CacheServer = 11, // Cache/session store
|
||||
ProxyGateway = 12, // Reverse proxy, API gateway
|
||||
TestRunner = 13, // Test framework execution
|
||||
DevServer = 14, // Development-only server
|
||||
}
|
||||
```
|
||||
|
||||
### CapabilityClass Enumeration
|
||||
```csharp
|
||||
[Flags]
|
||||
public enum CapabilityClass : long
|
||||
{
|
||||
None = 0,
|
||||
NetworkListen = 1 << 0, // Opens listening socket
|
||||
NetworkConnect = 1 << 1, // Makes outbound connections
|
||||
FileRead = 1 << 2, // Reads from filesystem
|
||||
FileWrite = 1 << 3, // Writes to filesystem
|
||||
ProcessSpawn = 1 << 4, // Spawns child processes
|
||||
CryptoOperation = 1 << 5, // Encryption/signing operations
|
||||
DatabaseAccess = 1 << 6, // Database client operations
|
||||
MessageQueue = 1 << 7, // Message broker client
|
||||
CacheAccess = 1 << 8, // Cache client operations
|
||||
ExternalApi = 1 << 9, // External HTTP API calls
|
||||
UserInput = 1 << 10, // Accepts user input
|
||||
ConfigLoad = 1 << 11, // Loads configuration files
|
||||
SecretAccess = 1 << 12, // Accesses secrets/credentials
|
||||
LogEmit = 1 << 13, // Emits logs
|
||||
MetricsEmit = 1 << 14, // Emits metrics/telemetry
|
||||
SystemCall = 1 << 15, // Makes privileged syscalls
|
||||
ContainerEscape = 1 << 16, // Capabilities enabling escape
|
||||
KernelModule = 1 << 17, // Loads kernel modules
|
||||
Ptrace = 1 << 18, // Process tracing
|
||||
RawSocket = 1 << 19, // Raw network access
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
|------------|--------|-------|
|
||||
| 2025-12-13 | Created sprint from program sprint 0410; defined 25 tasks across schema, adapters, integration, QA/docs; included schema previews. | Planning |
|
||||
@@ -110,9 +110,9 @@
|
||||
| 1 | Finalize VEX decision schema with Excititor team | Platform Guild | 2025-12-02 | DONE |
|
||||
| 2 | Confirm attestation predicate types with Attestor team | API Guild | 2025-12-03 | DONE |
|
||||
| 3 | Review audit bundle format with Export Center team | API Guild | 2025-12-04 | DONE |
|
||||
| 4 | Accessibility review of VEX modal with Accessibility Guild | UI Guild | 2025-12-09 | TODO |
|
||||
| 4 | Accessibility review of VEX modal with Accessibility Guild | UI Guild | 2025-12-09 | DONE |
|
||||
| 5 | Align UI work to canonical workspace `src/Web/StellaOps.Web` | DevEx · UI Guild | 2025-12-06 | DONE |
|
||||
| 6 | Regenerate deterministic fixtures for triage/VEX components (tests/e2e/offline-kit) | DevEx · UI Guild | 2025-12-13 | TODO |
|
||||
| 6 | Regenerate deterministic fixtures for triage/VEX components (tests/e2e/offline-kit) | DevEx · UI Guild | 2025-12-13 | DONE |
|
||||
|
||||
## Decisions & Risks
|
||||
| Risk | Impact | Mitigation / Next Step |
|
||||
@@ -138,6 +138,7 @@
|
||||
| 2025-12-12 | Normalized prerequisites to archived advisory/sprint paths; aligned API endpoint paths and Wave A deliverables to `src/Web/StellaOps.Web`. | Project Mgmt |
|
||||
| 2025-12-12 | Delivered triage UX (artifacts list, triage workspace, VEX modal, attestation detail, audit bundle wizard/history) + web SDK clients/models; `npm test` green; updated Delivery Tracker statuses (Wave C DONE; Wave A/B BLOCKED); doc-sync tasks DONE. | Implementer |
|
||||
| 2025-12-12 | Synced sprint tracker to implementation: Wave A/B (SCHEMA-08-*, DTO-09-*, API-VEX-06-*, API-AUDIT-07-*) and TRIAGE-GAPS-215-042 / UI-PROOF-VEX-0215-010 / TTE-GAPS-0215-011 now DONE; Action Tracker #1-3 DONE; remaining Action Tracker #4 and #6. | Implementer |
|
||||
| 2025-12-13 | Completed Action Tracker #4/#6: Playwright Axe a11y smoke passes in strict mode for triage VEX modal (`src/Web/StellaOps.Web/tests/e2e/a11y-smoke.spec.ts`) and graph severity filter label is now associated (`src/Web/StellaOps.Web/src/app/features/graph/graph-explorer.component.html`); triage quickstart fixtures remain deterministic via mock clients (`src/Web/StellaOps.Web/src/app/core/api/vex-decisions.client.ts`, `src/Web/StellaOps.Web/src/app/core/api/audit-bundles.client.ts`). | Implementer |
|
||||
|
||||
---
|
||||
*Sprint created: 2025-11-28*
|
||||
@@ -4,7 +4,7 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
|
||||
## Latest updates (2025-12-12)
|
||||
- Deterministic SBOM composition fixture published at `docs/modules/scanner/fixtures/deterministic-compose/` with DSSE, `_composition.json`, BOM, and hashes; doc `deterministic-sbom-compose.md` promoted to Ready v1.0 with offline verification steps.
|
||||
- Node analyzer now ingests npm/yarn/pnpm lockfiles, emitting `DeclaredOnly` components with lock provenance. The CLI companion command `stella node lock-validate` runs the collector offline, surfaces declared-only or missing-lock packages, and emits telemetry via `stellaops.cli.node.lock_validate.count`.
|
||||
- Node analyzer now ingests npm/yarn/pnpm lockfiles, emitting `DeclaredOnly` components with lock provenance. The CLI companion command `stella node lock-validate` runs the collector offline, surfaces declared-only or missing-lock packages, and emits telemetry via `stellaops.cli.node.lock_validate.count`. See `docs/modules/scanner/analyzers-node.md` and bench scenario `node_detection_gaps_fixture`.
|
||||
- Python analyzer picks up `requirements*.txt`, `Pipfile.lock`, and `poetry.lock`, tagging installed distributions with lock provenance and generating declared-only components for policy. Use `stella python lock-validate` to run the same checks locally before images are built.
|
||||
- Java analyzer now parses `gradle.lockfile`, `gradle/dependency-locks/**/*.lockfile`, and `pom.xml` dependencies via the new `JavaLockFileCollector`, merging lock metadata onto jar evidence and emitting declared-only components when jars are absent. The new CLI verb `stella java lock-validate` reuses that collector offline (table/JSON output) and records `stellaops.cli.java.lock_validate.count{outcome}` for observability.
|
||||
- Worker/WebService now resolve cache roots and feature flags via `StellaOps.Scanner.Surface.Env`; misconfiguration warnings are documented in `docs/modules/scanner/design/surface-env.md` and surfaced through startup validation.
|
||||
@@ -37,6 +37,7 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
- ./operations/analyzers-grafana-dashboard.json
|
||||
- ./operations/rustfs-migration.md
|
||||
- ./operations/entrypoint.md
|
||||
- ./analyzers-node.md
|
||||
- ./operations/secret-leak-detection.md
|
||||
- ./operations/dsse-rekor-operator-guide.md
|
||||
- ./os-analyzers-evidence.md
|
||||
|
||||
81
docs/modules/scanner/analyzers-bun.md
Normal file
81
docs/modules/scanner/analyzers-bun.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Bun Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories npm-ecosystem dependencies from Bun-managed projects without executing `bun`.
|
||||
- Supports installed inventory (`node_modules/**/package.json`), lockfile-only inventory (`bun.lock`), and declared-only fallback from `package.json`.
|
||||
- Enriches output with deterministic scope signals (`dev`, `optional`, `peer`, `scopeUnknown`), patch attribution, and bounded sha256 evidence.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed inventory** (`node_modules/` present): traverse installed packages and emit components from installed `package.json` (uses `bun.lock` for resolved/integrity + scope enrichment when present).
|
||||
2. **Lockfile-only** (`bun.lock` present, no install): parse `bun.lock` and emit components from lock entries.
|
||||
3. **Declared-only fallback** (project markers present but no `bun.lock`/install): emit explicit-key components from `package.json` dependency sections.
|
||||
4. **Unsupported** (`bun.lockb` only): emit a remediation record explaining how to produce `bun.lock`.
|
||||
|
||||
## Project discovery (including container roots)
|
||||
The analyzer discovers Bun project roots under:
|
||||
- The analysis root (`context.RootPath`)
|
||||
- Common OCI unpack layouts: `layers/*`, `.layers/*`, and `layer*` (direct children)
|
||||
|
||||
Discovery is bounded and deterministic:
|
||||
- Sorted directory enumeration
|
||||
- Explicit depth and root caps
|
||||
- Never recurses into `node_modules/`
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:npm/<name>@<version>`
|
||||
- Concrete versions follow the Node-style guardrail (no ranges/tags/paths embedded as a "version"; see `Internal/BunVersionSpec.IsConcreteNpmVersion`).
|
||||
|
||||
Non-concrete versions emit an explicit key:
|
||||
- `componentKey = explicit::<analyzerId>::npm::<name>::sha256:<digest>`
|
||||
- `purl = null`, `version = null`
|
||||
- Used for declared-only dependencies and any lock/installed records whose `version` is not concrete (e.g., `workspace:*`, `link:../...`, `file:../...`).
|
||||
|
||||
Explicit-key digest input (canonical, UTF-8):
|
||||
```
|
||||
npm\n<name>\n<spec>\n<originLocator>
|
||||
```
|
||||
Generated via `LanguageExplicitKey.Create(...)` and aligned with `docs/modules/scanner/language-analyzers-contract.md`.
|
||||
|
||||
## Evidence and locators
|
||||
All evidence locators are relative and use `/` separators.
|
||||
|
||||
### File evidence
|
||||
- Installed packages: `node_modules/.../package.json`
|
||||
- Hashing: sha256 is computed for `package.json` only when size is within 1 MiB; when skipped, metadata includes:
|
||||
- `packageJson.hashSkipped=true`
|
||||
- `packageJson.hashSkipReason=<missing|unauthorized|io|size>...`
|
||||
|
||||
### Lockfile entry evidence
|
||||
- Locator format: `<lockfileRelativePath>:packages[<name>@<version>]`
|
||||
- Example: `bun.lock:packages[lodash@4.17.21]`
|
||||
- Hashing: sha256 is computed for `bun.lock` only when size is within 50 MiB; when skipped, metadata includes:
|
||||
- `bunLock.hashSkipped=true`
|
||||
- `bunLock.hashSkipReason=<missing|unauthorized|io|size>...`
|
||||
|
||||
## Scope semantics (dev/optional/peer)
|
||||
Scope is derived deterministically from the `bun.lock` dependency graph rooted at `package.json` declarations:
|
||||
- `dev=true` only when dev reachability is provable.
|
||||
- `optional=true` and `peer=true` are preserved when present in lock data or derived from declared scopes.
|
||||
- If the graph cannot disambiguate (multiple candidates/specifier mismatch), the record is marked:
|
||||
- `scopeUnknown=true`
|
||||
- `dev=false` (do not guess)
|
||||
|
||||
`includeDev=false` filters only packages proven to be dev-only; unknown-scope packages are kept but marked `scopeUnknown=true`.
|
||||
|
||||
## Patches and workspaces
|
||||
- Workspace patterns come from root `package.json` (`workspaces`).
|
||||
- Patch attribution supports Bun's `patchedDependencies` and patch directories.
|
||||
- Patch keys preserve version specificity (`name@version`) and patch paths are emitted as deterministic project-relative paths.
|
||||
- Patch matching precedence: `name@version` first; then name-only only when unambiguous.
|
||||
|
||||
## Known limitations
|
||||
- `bun.lockb` (binary lockfile) is not parsed; a remediation record is emitted instead.
|
||||
- The analyzer does not execute `bun` and does not fetch registries; offline-only behavior is enforced.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Design notes: `docs/modules/scanner/prep/bun-analyzer-design.md`
|
||||
- Gotchas: `docs/modules/scanner/bun-analyzer-gotchas.md`
|
||||
|
||||
65
docs/modules/scanner/analyzers-java.md
Normal file
65
docs/modules/scanner/analyzers-java.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Java Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Maven coordinates from JVM archives (JAR/WAR/EAR/fat JAR) without executing build tools.
|
||||
- Prefers installed artifact metadata (`META-INF/maven/**/pom.properties`), with a `pom.xml` fallback when properties are missing.
|
||||
- Enriches output with bounded embedded-library scan metadata and JNI usage hints.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed archive inventory**: parse Maven coordinates from `META-INF/maven/**/pom.properties` in each discovered archive.
|
||||
2. **`pom.xml` fallback**: when no `pom.properties` in the archive, parse `META-INF/maven/**/pom.xml` and emit a Maven PURL only when `groupId`, `artifactId`, and `version` are concrete (no placeholders like `${...}`).
|
||||
3. **Lock augmentation (current)**: when a lock entry matches an installed artifact, merge lock metadata onto the component; unmatched lock entries still emit declared-only components.
|
||||
4. **Multi-module lock precedence (pending)**: deterministic precedence rules are tracked in `SCAN-JAVA-403-003` (blocked).
|
||||
5. **Runtime images (pending)**: runtime component identity is tracked in `SCAN-JAVA-403-004` (blocked).
|
||||
|
||||
## Embedded archives (fat JAR / WAR / EAR layouts)
|
||||
The analyzer scans embedded library jars without extracting them to disk:
|
||||
- `BOOT-INF/lib/*.jar`
|
||||
- `WEB-INF/lib/*.jar`
|
||||
- `APP-INF/lib/*.jar`
|
||||
- `lib/*.jar`
|
||||
|
||||
### Locator format
|
||||
Evidence locators are nested deterministically using `!` separators:
|
||||
- `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`
|
||||
|
||||
### Bounds and skip markers
|
||||
Embedded scanning is bounded and deterministic:
|
||||
- Max embedded jars per archive: `256`
|
||||
- Max embedded jar bytes: `25 MiB`
|
||||
|
||||
When embedded scanning is skipped or truncated, the outer component metadata includes deterministic markers:
|
||||
- `embeddedScan.candidateJars`, `embeddedScan.scannedJars`, `embeddedScan.emittedComponents`
|
||||
- `embeddedScanSkipped=true`, `embeddedScan.skippedJars`, `embeddedScanSkipReasons=<...>` (when applicable)
|
||||
|
||||
Embedded components include:
|
||||
- `embedded=true`
|
||||
- `embedded.containerJarPath=<outerRelativePath>`
|
||||
- `embedded.entryPath=<embeddedEntryPath>`
|
||||
|
||||
## Evidence and hashing
|
||||
- Evidence locators are project-relative, use `/` separators, and use `!` for nested artifact paths.
|
||||
- `sha256` for `pom.properties` and `pom.xml` evidence is computed over the raw entry bytes.
|
||||
|
||||
## `pom.xml` with incomplete coordinates
|
||||
When `pom.xml` is present but coordinates are incomplete (missing values or `${...}` placeholders), the analyzer emits an explicit-key component:
|
||||
- `purl=null`, `version=null`
|
||||
- `metadata.unresolvedCoordinates=true`
|
||||
- `componentKey` follows the cross-analyzer explicit-key scheme via `LanguageExplicitKey.Create("java", "maven", ...)`
|
||||
|
||||
## JNI metadata (bytecode-based)
|
||||
JNI hints are derived from parsed bytecode (native method flags and load call sites), not raw ASCII scanning.
|
||||
|
||||
When bytecode analysis finds JNI edges (`jni.edgeCount > 0`), components are annotated with bounded, deterministic metadata:
|
||||
- `jni.edgeCount`, `jni.nativeMethodCount`, `jni.loadCallCount`, optional `jni.warningCount`
|
||||
- `jni.reasons` (distinct reason codes)
|
||||
- `jni.targetLibraries` (top-N stable sample; currently 12)
|
||||
|
||||
## Known limitations
|
||||
- Shaded jars that strip Maven metadata remain best-effort; embedded libs without Maven metadata do not emit components.
|
||||
- Gradle multi-module lock precedence and runtime image component identity remain blocked until explicit decisions land.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/JavaLanguageAnalyzer.cs`
|
||||
79
docs/modules/scanner/analyzers-node.md
Normal file
79
docs/modules/scanner/analyzers-node.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Node Analyzer (npm/Yarn/pnpm)
|
||||
|
||||
This document captures the Node language analyzer’s deterministic behavior guarantees and safety constraints (what it emits, what it refuses to emit, and how it stays bounded/offline).
|
||||
|
||||
## Component identity & precedence
|
||||
|
||||
### Installed vs declared-only
|
||||
- The analyzer always emits **on-disk inventory** first (workspace member manifests + installed `node_modules`/PNPM/Yarn PnP cache packages).
|
||||
- It then emits **declared-only** components for lockfile / manifest declarations that are **not backed by on-disk inventory**:
|
||||
- If a declared entry has a **concrete resolved version** from a lockfile, it emits a versioned `pkg:npm/...@<version>` PURL.
|
||||
- If the version is **non-concrete** (ranges/tags/git/file/workspace/link/path), it emits an **explicit-key** component (`purl=null`, `version=null`).
|
||||
|
||||
### Identity safety (PURL vs explicit-key)
|
||||
- Concrete PURLs are emitted only when the analyzer can prove a **concrete version** from local evidence (installed `package.json` or a lockfile-resolved entry).
|
||||
- Declared-only/non-concrete dependencies use `LanguageExplicitKey` (see `docs/modules/scanner/language-analyzers-contract.md`).
|
||||
|
||||
### Lock metadata lookup precedence
|
||||
When attaching lock metadata to an installed package:
|
||||
1) `package-lock.json` path match (`packages["<relativePath>"]`),
|
||||
2) `(name, version)` match (Yarn/pnpm multi-version support),
|
||||
3) fallback to name-only (last-wins) for legacy locks.
|
||||
|
||||
## Lockfile parsing guarantees (offline)
|
||||
|
||||
### `package-lock.json` (npm)
|
||||
- Supports v3+ `packages{}` layout and legacy `dependencies{}` traversal.
|
||||
- Correctly extracts nested names from `node_modules/.../node_modules/...` paths (including scoped packages).
|
||||
|
||||
### `yarn.lock` (Yarn v1 + Berry v2/v3)
|
||||
- Supports both Yarn v1 (`resolved "https://..."`) and Berry fields (`resolution:`, `checksum:`).
|
||||
- If `integrity` is absent but `checksum` is present, the analyzer records integrity-like evidence as `checksum:<value>`.
|
||||
- Ignores the `__metadata` section.
|
||||
|
||||
### `pnpm-lock.yaml` (pnpm)
|
||||
- Parses modern `packages:` and `snapshots:` sections.
|
||||
- Does not drop entries that lack `integrity` (workspace/link/file/git); instead it emits:
|
||||
- `lockIntegrityMissing=true`
|
||||
- `lockIntegrityMissingReason=<workspace|link|file|git|directory|missing>`
|
||||
|
||||
## Workspaces
|
||||
- Reads workspace members from the root `package.json` (`workspaces` array or `{ packages: [...] }` form).
|
||||
- Supports glob patterns:
|
||||
- `*` (single segment)
|
||||
- `**` (multi-segment)
|
||||
- Expansion is bounded and deterministic:
|
||||
- Skips `node_modules`
|
||||
- Caps traversal depth and total visited directories/members
|
||||
- Stable, sorted member output
|
||||
- Dependency scopes (`production|development|peer|optional`) are derived from both the root and workspace manifests, with deterministic precedence.
|
||||
|
||||
## Import scanning (bounded)
|
||||
- Import scanning runs only for the root package and workspace member packages (not `node_modules` packages).
|
||||
- File types: `.js/.jsx/.mjs/.cjs/.ts/.tsx/.mts/.cts`.
|
||||
- Parser behavior:
|
||||
- Attempts AST parsing as script/module; falls back to a bounded regex heuristic for TS when parsing fails.
|
||||
- Hard caps per package:
|
||||
- `maxFiles=500`, `maxBytes=5MiB`, `maxFileBytes=512KiB`, `maxDepth=20`
|
||||
- Skips `node_modules` and `.pnpm` directories during traversal
|
||||
- If capped, the analyzer marks the package metadata with:
|
||||
- `importScanSkipped=true`
|
||||
- `importScan.filesScanned=<n>`
|
||||
- `importScan.bytesScanned=<n>`
|
||||
|
||||
## Container layer layouts
|
||||
- Candidate layer roots under the analysis root:
|
||||
- `layers/*`, `.layers/*`, `layer*`
|
||||
- Each candidate root is scanned independently.
|
||||
- The analyzer also discovers `package.json` roots nested under layer roots (bounded depth) and includes their nested `node_modules` roots when present.
|
||||
|
||||
## Determinism & evidence hashing
|
||||
- On-disk `package.json` manifests are hashed (sha256) when ≤ 1 MiB and attached to the root evidence for deterministic provenance.
|
||||
- Output ordering is stable (componentKey ordering, sorted metadata/evidence).
|
||||
|
||||
## Benchmark
|
||||
- Scenario id: `node_detection_gaps_fixture` (config: `src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json`)
|
||||
- Fixture root: `samples/runtime/node-detection-gaps`
|
||||
- Run:
|
||||
- `dotnet run --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --config src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom`
|
||||
- Prometheus output includes additional metrics under `scanner_analyzer_bench_metric{scenario=\"...\",name=\"node.importScan.*\"}`.
|
||||
69
docs/modules/scanner/analyzers-python.md
Normal file
69
docs/modules/scanner/analyzers-python.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Python Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Python distributions without executing `python`/`pip` (static inspection only).
|
||||
- Prefers installed distribution metadata (`*.dist-info/`) and validates `RECORD` when present (bounded, streaming IO).
|
||||
- Emits deterministic component metadata (`pkg.kind`, `pkg.confidence`, `pkg.location`) and evidence locators for replay/audit.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed inventory (preferred)**: detect site-packages roots and parse `*.dist-info/` / `*.egg-info/` metadata for concrete `pkg:pypi/<name>@<version>` components.
|
||||
2. **Archive inventory**: mount wheels (`*.whl`) and zipapps (`*.pyz`, `*.pyzw`) into the Python VFS and enrich any in-archive `*.dist-info/` metadata (including `RECORD` verification).
|
||||
3. **Lock augmentation (current)**: parse root-level `requirements*.txt` pinned entries (`==`/`===`), `Pipfile.lock` `default` section, and `poetry.lock`; when a lock entry matches an installed component, merge lock metadata.
|
||||
4. **Declared-only (current)**: lock entries not present in installed inventory still emit components:
|
||||
- concrete versions emit a versioned `pkg:pypi/...@<version>` PURL
|
||||
- non-concrete declarations (e.g., editable paths) emit explicit-key components (see Identity Rules)
|
||||
|
||||
## Project discovery (including container roots)
|
||||
The analyzer is layout-aware and bounded:
|
||||
- Virtualenv layout roots are detected via `pyvenv.cfg` or `venv/`-style directories.
|
||||
- Site-packages roots include `lib/python*/site-packages` and `lib/python*/dist-packages`.
|
||||
- Container unpack layouts are supported as additional candidate roots:
|
||||
- `layers/*` (direct children)
|
||||
- `.layers/*` (direct children)
|
||||
- `layer*` (direct children of the analysis root)
|
||||
|
||||
## Virtual filesystem (VFS) and determinism
|
||||
- Inputs are normalized deterministically (dedupe + stable ordering); later/higher-confidence inputs override earlier ones in the VFS overlay.
|
||||
- Archive virtual roots are stable and collision-safe:
|
||||
- `archives/wheel/<file>`
|
||||
- `archives/zipapp/<file>`
|
||||
- `archives/sdist/<file>`
|
||||
- collisions use a deterministic `~N` suffix
|
||||
- Evidence locators are always analysis-root relative and use `/` separators.
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:pypi/<normalizedName>@<version>`
|
||||
|
||||
Non-concrete declarations emit an explicit key:
|
||||
- `componentKey = explicit::<analyzerId>::pypi::<name>::sha256:<digest>`
|
||||
- `purl = null`, `version = null`
|
||||
- generated via `LanguageExplicitKey.Create(...)` and aligned with `docs/modules/scanner/language-analyzers-contract.md`
|
||||
|
||||
Editable declarations (from requirements `--editable` / `-e`) normalize the specifier:
|
||||
- project-relative paths stay relative (`editable-src`)
|
||||
- absolute/host paths are redacted and never appear in the digest input
|
||||
|
||||
## Evidence and metadata
|
||||
Installed and archive distributions emit evidence for (when present):
|
||||
- `METADATA`, `RECORD`, `WHEEL`, `INSTALLER`, `entry_points.txt`, `direct_url.json`
|
||||
|
||||
`RECORD` verification emits deterministic counters:
|
||||
- `record.totalEntries`, `record.hashedEntries`, `record.missingFiles`, `record.hashMismatches`, `record.ioErrors`
|
||||
- plus `record.unsupportedAlgorithms` when algorithms outside the supported set are present
|
||||
|
||||
Declared-only/lock-only components include:
|
||||
- `declaredOnly=true`
|
||||
- `lockSource`, `lockLocator`, optional `lockResolved`, `lockIndex`, `lockExtras`, `lockEditablePath`
|
||||
|
||||
## Container overlay semantics (pending contract)
|
||||
When scanning raw OCI layer trees, correct overlay/whiteout handling is contract-driven. Until that contract lands, treat per-layer inventory as best-effort and do not rely on it as a merged-rootfs truth source.
|
||||
|
||||
## Vendored/bundled packages (pending contract)
|
||||
Vendored directory signals are detected but representation (separate components vs parent-only metadata) is contract-driven to avoid false vulnerability joins.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/PythonLanguageAnalyzer.cs`
|
||||
|
||||
@@ -42,9 +42,14 @@ src/
|
||||
└─ Tools/
|
||||
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
```
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
```
|
||||
|
||||
Per-analyzer notes (language analyzers):
|
||||
- `docs/modules/scanner/analyzers-java.md`
|
||||
- `docs/modules/scanner/analyzers-bun.md`
|
||||
- `docs/modules/scanner/analyzers-python.md`
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
|
||||
### 1.2 Native reachability upgrades (Nov 2026)
|
||||
|
||||
@@ -397,7 +402,9 @@ scanner:
|
||||
|
||||
---
|
||||
|
||||
## 12) Testing matrix
|
||||
## 12) Testing matrix
|
||||
|
||||
* **Analyzer contracts:** see `language-analyzers-contract.md` and per-analyzer docs (e.g., `analyzers-java.md`, Sprint 0403).
|
||||
|
||||
* **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
|
||||
* **OS packages:** ground‑truth images per distro; compare to package DB.
|
||||
|
||||
110
docs/modules/scanner/language-analyzers-contract.md
Normal file
110
docs/modules/scanner/language-analyzers-contract.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Scanner Language Analyzer Contracts (Identity / Evidence / Container Layout)
|
||||
|
||||
This document freezes the cross-analyzer contracts that are shared by the language analyzers (Java, .NET, Python, Node, Bun). These rules exist to prevent false matches, keep outputs deterministic, and protect against host-path leakage.
|
||||
|
||||
## 1) Identity Safety Contract (PURL vs Explicit Key)
|
||||
|
||||
### 1.1 Goals
|
||||
- **No fake versions**: never encode version ranges, tags, local paths, or git URLs as a versioned PURL.
|
||||
- **No collisions**: explicit-key identities must not collide with concrete PURLs and must be deterministic across OS path separators.
|
||||
- **Proof-first**: emit concrete PURLs only when the analyzer has concrete, replayable evidence for the version.
|
||||
|
||||
### 1.2 When to emit a concrete PURL
|
||||
Emit a concrete (versioned) PURL only when **both** are true:
|
||||
1) The analyzer can determine a **concrete version** (ecosystem-specific) for the component.
|
||||
2) The version is backed by **replayable evidence** (e.g., installed artifact metadata or lockfile-resolved entry).
|
||||
|
||||
Typical sources that qualify:
|
||||
- **Installed inventory** (e.g., `node_modules/**/package.json`, Python `*.dist-info/METADATA`, .NET `deps.json` entries).
|
||||
- **Lockfile-resolved inventory** (e.g., `bun.lock` entry with `name@version` and integrity/resolved URL).
|
||||
|
||||
### 1.3 When to emit an explicit-key component (required)
|
||||
Emit an explicit-key component when the dependency is **declared-only** or otherwise **non-concrete**:
|
||||
- Version ranges / operators (`^`, `~`, `>=`, `<`, `*`, `x`, `latest`, etc.).
|
||||
- Workspace/link/file dependencies (`workspace:*`, `link:`, `file:`, local path refs, editable installs).
|
||||
- Git dependencies (git URL / commit / ref) when a concrete semantic version is not provable from local evidence.
|
||||
- Unknown / missing version.
|
||||
|
||||
**Rule:** If the analyzer cannot prove a concrete version from local evidence, it must not emit a versioned PURL for that dependency.
|
||||
|
||||
### 1.4 Explicit-key format (canonical)
|
||||
For declared-only / non-concrete identities, analyzers must emit:
|
||||
- `componentKey`: `explicit::<analyzerId>::<ecosystem>::<name>::sha256:<digest>`
|
||||
- `purl`: `null`
|
||||
- `version`: `null`
|
||||
|
||||
Where `<digest>` is `sha256` of the canonical UTF-8 string:
|
||||
```
|
||||
<ecosystem>\n<normalizedName>\n<normalizedSpec>\n<originLocator>
|
||||
```
|
||||
|
||||
Canonicalization rules:
|
||||
- `<normalizedName>` uses ecosystem naming rules (e.g., npm scoped names keep `@scope/name`).
|
||||
- `<normalizedSpec>` is the **original declared specifier** (range/tag/url/path), trimmed; for unknown, use `""`.
|
||||
- `<originLocator>` is project-relative with `/` separators (e.g., `package.json#dependencies`, `requirements.txt`, `Directory.Packages.props#PackageVersion:Foo`).
|
||||
- No absolute paths, drive letters, or host roots appear in any input to the digest.
|
||||
|
||||
### 1.5 Required metadata for explicit-key components
|
||||
Explicit-key components must include (at minimum) these metadata keys:
|
||||
- `declaredOnly=true`
|
||||
- `declared.source=<file>` (e.g., `package.json`, `Directory.Packages.props`)
|
||||
- `declared.locator=<originLocator>` (same string used in digest)
|
||||
- `declared.versionSpec=<normalizedSpec>` (original specifier or empty)
|
||||
- `declared.scope=<prod|dev|peer|optional|unknown>` when applicable
|
||||
- `declared.sourceType=<range|tag|git|tarball|file|link|workspace|path|editable|unknown>`
|
||||
|
||||
## 2) Evidence Locator Contract
|
||||
|
||||
### 2.1 General rules
|
||||
- Evidence locators are **external-facing** and must be stable and parseable.
|
||||
- Every locator is **project-relative** with `/` separators (never absolute).
|
||||
- Evidence content/hashing must be bounded; when bounds are exceeded, emit deterministic `skipped` markers in metadata instead of silently omitting.
|
||||
|
||||
### 2.2 Locator formats (canonical)
|
||||
**File evidence**
|
||||
- `locator`: `<relativePath>` (e.g., `packages/app/package.json`)
|
||||
- `source`: a stable discriminator (e.g., `package.json`, `pom.xml`, `METADATA`)
|
||||
|
||||
**Lockfile entry evidence**
|
||||
- `locator`: `<lockfileRelativePath>:<selector>`
|
||||
- Examples:
|
||||
- Node package-lock: `package-lock.json:packages/app/node_modules/foo`
|
||||
- Bun lock: `bun.lock:packages[foo@1.2.3]`
|
||||
- Maven/Gradle lock: `gradle.lockfile:com.example:foo:1.2.3`
|
||||
|
||||
**Nested artifact evidence**
|
||||
- `locator`: `<outer>!<inner>!<path>`
|
||||
- Example: `demo-jni.jar!META-INF/native-image/demo/jni-config.json`
|
||||
|
||||
**Derived evidence**
|
||||
- `locator`: a stable synthetic name (e.g., `phase22.ndjson`)
|
||||
- `source`: a stable synthetic source (e.g., `node.observation`)
|
||||
|
||||
### 2.3 Hashing rules (baseline)
|
||||
- Hash only bounded inputs (default: 1 MiB per evidence value/file; analyzers may choose a tighter cap).
|
||||
- Hash algorithm: `sha256` over UTF-8 bytes for textual evidence, raw bytes for file evidence.
|
||||
- If hashing is skipped due to bounds or errors, emit deterministic metadata markers (e.g., `hashSkipped=true`, `hashSkipped.reason=sizeCap`).
|
||||
|
||||
## 3) Container Layout Discovery Contract
|
||||
|
||||
### 3.1 Layer root candidates
|
||||
Language analyzers that support container-root discovery must treat these as **candidate roots** under the analysis root:
|
||||
- `layers/*` (direct children)
|
||||
- `.layers/*` (direct children; **must not be skipped**)
|
||||
- `layer*` (direct children of the analysis root, e.g., `layer1/`, `layer2/`)
|
||||
|
||||
Each candidate root is scanned independently for projects.
|
||||
|
||||
### 3.2 Bounds and traversal safety (required)
|
||||
- Deterministic traversal (sorted directory enumeration).
|
||||
- Depth caps per candidate root; hard cap on total discovered project roots.
|
||||
- Must never recurse into `node_modules/` (Node/Bun) or equivalent heavy dirs.
|
||||
- Hidden directories may be skipped **except** `.layers` which is treated as a top-level candidate root.
|
||||
- No symlink escape: if symlinks are followed, resolved targets must remain within the candidate root prefix and cycles must be prevented.
|
||||
|
||||
### 3.3 Overlay/whiteout semantics
|
||||
- If an analyzer implements overlay semantics (notably Python container adapters), whiteouts and precedence rules must be explicit, deterministic, and fixture-tested.
|
||||
- If an analyzer does **not** implement overlay semantics, it must still keep discovery bounded and must not silently drop projects; emit deterministic "skipped" markers when bounds prevent full traversal.
|
||||
|
||||
## Compliance
|
||||
Sprints `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md` through `docs/implplan/SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md` (and the program sprint `docs/implplan/SPRINT_0408_0001_0001_scanner_language_detection_gaps_program.md`) carry the per-analyzer implementation and test evidence required to enforce this contract.
|
||||
@@ -1,7 +1,7 @@
|
||||
# Stella Policy DSL (`stella-dsl@1`)
|
||||
|
||||
> **Audience:** Policy authors, reviewers, and tooling engineers building lint/compile flows for the Policy Engine v2 rollout (Sprint 20).
|
||||
> **Imposed rule:** Policies that alter reachability or trust weighting must run in shadow mode first with coverage fixtures; promotion to active is blocked until shadow + coverage gates pass.
|
||||
> **Audience:** Policy authors, reviewers, and tooling engineers building lint/compile flows for the Policy Engine v2 rollout (Sprint 20).
|
||||
> **Imposed rule:** Policies that alter reachability or trust weighting must run in shadow mode first with coverage fixtures; promotion to active is blocked until shadow + coverage gates pass.
|
||||
|
||||
This document specifies the `stella-dsl@1` grammar, semantics, and guardrails used by Stella Ops to transform SBOM facts, Concelier advisories, and Excititor VEX statements into effective findings. Use it with the [Policy Engine Overview](overview.md) for architectural context and the upcoming lifecycle/run guides for operational workflows.
|
||||
|
||||
@@ -9,13 +9,13 @@ This document specifies the `stella-dsl@1` grammar, semantics, and guardrails us
|
||||
|
||||
## 1 · Design Goals
|
||||
|
||||
- **Deterministic:** Same policy + same inputs ⇒ identical findings on every machine.
|
||||
- **Declarative:** No arbitrary loops, network calls, or clock access.
|
||||
- **Explainable:** Every decision records the rule, inputs, and rationale in the explain trace.
|
||||
- **Lean authoring:** Common precedence, severity, and suppression patterns are first-class.
|
||||
- **Offline-friendly:** Grammar and built-ins avoid cloud dependencies, run the same in sealed deployments.
|
||||
- **Reachability-aware:** Policies can consume reachability lattice states (`ReachState`) and evidence scores to drive VEX gates (`not_affected`, `under_investigation`, `affected`).
|
||||
- **Signal-first:** Trust, reachability, entropy, and uncertainty signals are first-class so explain traces stay reproducible.
|
||||
- **Deterministic:** Same policy + same inputs ⇒ identical findings on every machine.
|
||||
- **Declarative:** No arbitrary loops, network calls, or clock access.
|
||||
- **Explainable:** Every decision records the rule, inputs, and rationale in the explain trace.
|
||||
- **Lean authoring:** Common precedence, severity, and suppression patterns are first-class.
|
||||
- **Offline-friendly:** Grammar and built-ins avoid cloud dependencies, run the same in sealed deployments.
|
||||
- **Reachability-aware:** Policies can consume reachability lattice states (`ReachState`) and evidence scores to drive VEX gates (`not_affected`, `under_investigation`, `affected`).
|
||||
- **Signal-first:** Trust, reachability, entropy, and uncertainty signals are first-class so explain traces stay reproducible.
|
||||
|
||||
---
|
||||
|
||||
@@ -42,26 +42,26 @@ policy "Default Org Policy" syntax "stella-dsl@1" {
|
||||
}
|
||||
}
|
||||
|
||||
rule vex_precedence priority 10 {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
because "Strong vendor justification prevails";
|
||||
}
|
||||
|
||||
rule reachability_gate priority 20 {
|
||||
when telemetry.reachability.state == "reachable" and telemetry.reachability.score >= 0.6
|
||||
then status := "affected"
|
||||
because "Runtime/graph evidence shows reachable code path";
|
||||
}
|
||||
|
||||
rule trust_penalty priority 30 {
|
||||
when signals.trust_score < 0.4 or signals.entropy_penalty > 0.2
|
||||
then severity := severity_band("critical")
|
||||
because "Low trust score or high entropy";
|
||||
}
|
||||
}
|
||||
```
|
||||
rule vex_precedence priority 10 {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
because "Strong vendor justification prevails";
|
||||
}
|
||||
|
||||
rule reachability_gate priority 20 {
|
||||
when telemetry.reachability.state == "reachable" and telemetry.reachability.score >= 0.6
|
||||
then status := "affected"
|
||||
because "Runtime/graph evidence shows reachable code path";
|
||||
}
|
||||
|
||||
rule trust_penalty priority 30 {
|
||||
when signals.trust_score < 0.4 or signals.entropy_penalty > 0.2
|
||||
then severity := severity_band("critical")
|
||||
because "Low trust score or high entropy";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
High-level layout:
|
||||
|
||||
@@ -141,10 +141,10 @@ annotate = "annotate", identifier, ":=", expression, ";" ;
|
||||
|
||||
Notes:
|
||||
|
||||
- `helper` is reserved for shared calculcations (not yet implemented in `@1`).
|
||||
- `else` branch executes only if `when` predicates evaluate truthy **and** no prior rule earlier in priority handled the tuple.
|
||||
- Semicolons inside rule bodies are optional when each clause is on its own line; the compiler emits canonical semicolons in IR.
|
||||
- `settings.shadow = true` enables shadow-mode evaluation (findings recorded but not enforced). Promotion gates require at least one shadow run with coverage fixtures.
|
||||
- `helper` is reserved for shared calculcations (not yet implemented in `@1`).
|
||||
- `else` branch executes only if `when` predicates evaluate truthy **and** no prior rule earlier in priority handled the tuple.
|
||||
- Semicolons inside rule bodies are optional when each clause is on its own line; the compiler emits canonical semicolons in IR.
|
||||
- `settings.shadow = true` enables shadow-mode evaluation (findings recorded but not enforced). Promotion gates require at least one shadow run with coverage fixtures.
|
||||
|
||||
---
|
||||
|
||||
@@ -152,23 +152,23 @@ Notes:
|
||||
|
||||
Within predicates and actions you may reference the following namespaces:
|
||||
|
||||
| Namespace | Fields | Description |
|
||||
|-----------|--------|-------------|
|
||||
| `sbom` | `purl`, `name`, `version`, `licenses`, `layerDigest`, `tags`, `usedByEntrypoint` | Component metadata from Scanner. |
|
||||
| `advisory` | `id`, `source`, `aliases`, `severity`, `cvss`, `publishedAt`, `modifiedAt`, `content.raw` | Canonical Concelier advisory view. |
|
||||
| `vex` | `status`, `justification`, `statementId`, `timestamp`, `scope` | Current VEX statement when iterating; aggregator helpers available. |
|
||||
| `vex.any(...)`, `vex.all(...)`, `vex.count(...)` | Functions operating over all matching statements. |
|
||||
| `run` | `policyId`, `policyVersion`, `tenant`, `timestamp` | Metadata for explain annotations. |
|
||||
| `env` | Arbitrary key/value pairs injected per run (e.g., `environment`, `runtime`). |
|
||||
| `telemetry` | Optional reachability signals. Example fields: `telemetry.reachability.state`, `telemetry.reachability.score`, `telemetry.reachability.policyVersion`. Missing fields evaluate to `unknown`. |
|
||||
| `signals` | Normalised signal dictionary: `trust_score` (0–1), `reachability.state` (`reachable|unreachable|unknown|under_investigation`), `reachability.score` (0–1), `reachability.confidence` (0–1), `reachability.evidence_ref` (string), `entropy_penalty` (0–0.3), `uncertainty.level` (`U1`–`U3`), `runtime_hits` (bool). |
|
||||
| `secret` | `findings`, `bundle`, helper predicates | Populated when the Secrets Analyzer runs. Exposes masked leak findings and bundle metadata for policy decisions. |
|
||||
| `profile.<name>` | Values computed inside profile blocks (maps, scalars). |
|
||||
|
||||
> **Reachability evidence gate.** When `reachability.state == "unreachable"` but `reachability.evidence_ref` is missing (or confidence is below the high-confidence threshold), Policy Engine downgrades the state to `under_investigation` to avoid false "not affected" claims.
|
||||
>
|
||||
> **Secrets namespace.** When `StellaOps.Scanner.Analyzers.Secrets` is enabled the Policy Engine receives masked findings (`secret.findings[*]`) plus bundle metadata (`secret.bundle.id`, `secret.bundle.version`). Policies should rely on the helper predicates listed below rather than reading raw arrays to preserve determinism and future compatibility.
|
||||
|
||||
| Namespace | Fields | Description |
|
||||
|-----------|--------|-------------|
|
||||
| `sbom` | `purl`, `name`, `version`, `licenses`, `layerDigest`, `tags`, `usedByEntrypoint` | Component metadata from Scanner. |
|
||||
| `advisory` | `id`, `source`, `aliases`, `severity`, `cvss`, `publishedAt`, `modifiedAt`, `content.raw` | Canonical Concelier advisory view. |
|
||||
| `vex` | `status`, `justification`, `statementId`, `timestamp`, `scope` | Current VEX statement when iterating; aggregator helpers available. |
|
||||
| `vex.any(...)`, `vex.all(...)`, `vex.count(...)` | Functions operating over all matching statements. |
|
||||
| `run` | `policyId`, `policyVersion`, `tenant`, `timestamp` | Metadata for explain annotations. |
|
||||
| `env` | Arbitrary key/value pairs injected per run (e.g., `environment`, `runtime`). |
|
||||
| `telemetry` | Optional reachability signals. Example fields: `telemetry.reachability.state`, `telemetry.reachability.score`, `telemetry.reachability.policyVersion`. Missing fields evaluate to `unknown`. |
|
||||
| `signals` | Normalised signal dictionary: `trust_score` (0–1), `reachability.state` (`reachable|unreachable|unknown|under_investigation`), `reachability.score` (0–1), `reachability.confidence` (0–1), `reachability.evidence_ref` (string), `entropy_penalty` (0–0.3), `uncertainty.level` (`U1`–`U3`), `runtime_hits` (bool). |
|
||||
| `secret` | `findings`, `bundle`, helper predicates | Populated when the Secrets Analyzer runs. Exposes masked leak findings and bundle metadata for policy decisions. |
|
||||
| `profile.<name>` | Values computed inside profile blocks (maps, scalars). |
|
||||
|
||||
> **Reachability evidence gate.** When `reachability.state == "unreachable"` but `reachability.evidence_ref` is missing (or confidence is below the high-confidence threshold), Policy Engine downgrades the state to `under_investigation` to avoid false "not affected" claims.
|
||||
>
|
||||
> **Secrets namespace.** When `StellaOps.Scanner.Analyzers.Secrets` is enabled the Policy Engine receives masked findings (`secret.findings[*]`) plus bundle metadata (`secret.bundle.id`, `secret.bundle.version`). Policies should rely on the helper predicates listed below rather than reading raw arrays to preserve determinism and future compatibility.
|
||||
|
||||
Missing fields evaluate to `null`, which is falsey in boolean context and propagates through comparisons unless explicitly checked.
|
||||
|
||||
---
|
||||
@@ -180,50 +180,50 @@ Missing fields evaluate to `null`, which is falsey in boolean context and propag
|
||||
| `normalize_cvss(advisory)` | `Advisory → SeverityScalar` | Parses `advisory.content.raw` for CVSS data; falls back to policy maps. |
|
||||
| `cvss(score, vector)` | `double × string → SeverityScalar` | Constructs a severity object manually. |
|
||||
| `severity_band(value)` | `string → SeverityBand` | Normalises strings like `"critical"`, `"medium"`. |
|
||||
| `risk_score(base, modifiers...)` | Variadic | Multiplies numeric modifiers (severity × trust × reachability). |
|
||||
| `reach_state(state)` | `string → ReachState` | Normalises reachability state strings (`reachable`, `unreachable`, `unknown`, `under_investigation`). |
|
||||
| `vex.any(predicate)` | `(Statement → bool) → bool` | `true` if any statement satisfies predicate. |
|
||||
| `risk_score(base, modifiers...)` | Variadic | Multiplies numeric modifiers (severity × trust × reachability). |
|
||||
| `reach_state(state)` | `string → ReachState` | Normalises reachability state strings (`reachable`, `unreachable`, `unknown`, `under_investigation`). |
|
||||
| `vex.any(predicate)` | `(Statement → bool) → bool` | `true` if any statement satisfies predicate. |
|
||||
| `vex.all(predicate)` | `(Statement → bool) → bool` | `true` if all statements satisfy predicate. |
|
||||
| `vex.latest()` | `→ Statement` | Lexicographically newest statement. |
|
||||
| `advisory.has_tag(tag)` | `string → bool` | Checks advisory metadata tags. |
|
||||
| `advisory.matches(pattern)` | `string → bool` | Glob match against advisory identifiers. |
|
||||
| `sbom.has_tag(tag)` | `string → bool` | Uses SBOM inventory tags (usage vs inventory). |
|
||||
| `sbom.any_component(predicate)` | `(Component → bool) → bool` | Iterates SBOM components, exposing `component` plus language scopes (e.g., `ruby`). |
|
||||
| `sbom.has_tag(tag)` | `string → bool` | Uses SBOM inventory tags (usage vs inventory). |
|
||||
| `sbom.any_component(predicate)` | `(Component → bool) → bool` | Iterates SBOM components, exposing `component` plus language scopes (e.g., `ruby`). |
|
||||
| `exists(expression)` | `→ bool` | `true` when value is non-null/empty. |
|
||||
| `coalesce(a, b, ...)` | `→ value` | First non-null argument. |
|
||||
| `days_between(dateA, dateB)` | `→ int` | Absolute day difference (UTC). |
|
||||
| `percent_of(part, whole)` | `→ double` | Fractions for scoring adjustments. |
|
||||
| `lowercase(text)` | `string → string` | Normalises casing deterministically (InvariantCulture). |
|
||||
| `secret.hasFinding(ruleId?, severity?, confidence?)` | `→ bool` | True if any secret leak finding matches optional filters. |
|
||||
| `secret.match.count(ruleId?)` | `→ int` | Count of findings, optionally scoped to a rule ID. |
|
||||
| `secret.bundle.version(required)` | `string → bool` | Ensures the active secret rule bundle version ≥ required (semantic compare). |
|
||||
| `secret.mask.applied` | `→ bool` | Indicates whether masking succeeded for all surfaced payloads. |
|
||||
| `secret.path.allowlist(patterns)` | `list<string> → bool` | True when all findings fall within allowed path patterns (useful for waivers). |
|
||||
|
||||
All built-ins are pure; if inputs are null the result is null unless otherwise noted.
|
||||
|
||||
---
|
||||
|
||||
### 6.1 · Ruby Component Scope
|
||||
|
||||
Inside `sbom.any_component(...)`, Ruby gems surface a `ruby` scope with the following helpers:
|
||||
|
||||
| Helper | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `ruby.group(name)` | `string → bool` | Matches Bundler group membership (`development`, `test`, etc.). |
|
||||
| `ruby.groups()` | `→ set<string>` | Returns all groups for the active component. |
|
||||
| `ruby.declared_only()` | `→ bool` | `true` when no vendor cache artefacts were observed for the gem. |
|
||||
| `ruby.source(kind?)` | `string? → bool` | Returns the raw source when called without args, or matches provenance kinds (`registry`, `git`, `path`, `vendor-cache`). |
|
||||
| `ruby.capability(name)` | `string → bool` | Checks capability flags emitted by the analyzer (`exec`, `net`, `scheduler`, `scheduler.activejob`, etc.). |
|
||||
| `ruby.capability_any(names)` | `set<string> → bool` | `true` when any capability in the set is present. |
|
||||
|
||||
Scheduler capability sub-types use dot notation (`ruby.capability("scheduler.sidekiq")`) and inherit from the broad `scheduler` capability.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Rule Semantics
|
||||
|
||||
1. **Ordering:** Rules execute in ascending `priority`. When priorities tie, lexical order defines precedence.
|
||||
| `coalesce(a, b, ...)` | `→ value` | First non-null argument. |
|
||||
| `days_between(dateA, dateB)` | `→ int` | Absolute day difference (UTC). |
|
||||
| `percent_of(part, whole)` | `→ double` | Fractions for scoring adjustments. |
|
||||
| `lowercase(text)` | `string → string` | Normalises casing deterministically (InvariantCulture). |
|
||||
| `secret.hasFinding(ruleId?, severity?, confidence?)` | `→ bool` | True if any secret leak finding matches optional filters. |
|
||||
| `secret.match.count(ruleId?)` | `→ int` | Count of findings, optionally scoped to a rule ID. |
|
||||
| `secret.bundle.version(required)` | `string → bool` | Ensures the active secret rule bundle version ≥ required (semantic compare). |
|
||||
| `secret.mask.applied` | `→ bool` | Indicates whether masking succeeded for all surfaced payloads. |
|
||||
| `secret.path.allowlist(patterns)` | `list<string> → bool` | True when all findings fall within allowed path patterns (useful for waivers). |
|
||||
|
||||
All built-ins are pure; if inputs are null the result is null unless otherwise noted.
|
||||
|
||||
---
|
||||
|
||||
### 6.1 · Ruby Component Scope
|
||||
|
||||
Inside `sbom.any_component(...)`, Ruby gems surface a `ruby` scope with the following helpers:
|
||||
|
||||
| Helper | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `ruby.group(name)` | `string → bool` | Matches Bundler group membership (`development`, `test`, etc.). |
|
||||
| `ruby.groups()` | `→ set<string>` | Returns all groups for the active component. |
|
||||
| `ruby.declared_only()` | `→ bool` | `true` when no vendor cache artefacts were observed for the gem. |
|
||||
| `ruby.source(kind?)` | `string? → bool` | Returns the raw source when called without args, or matches provenance kinds (`registry`, `git`, `path`, `vendor-cache`). |
|
||||
| `ruby.capability(name)` | `string → bool` | Checks capability flags emitted by the analyzer (`exec`, `net`, `scheduler`, `scheduler.activejob`, etc.). |
|
||||
| `ruby.capability_any(names)` | `set<string> → bool` | `true` when any capability in the set is present. |
|
||||
|
||||
Scheduler capability sub-types use dot notation (`ruby.capability("scheduler.sidekiq")`) and inherit from the broad `scheduler` capability.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Rule Semantics
|
||||
|
||||
1. **Ordering:** Rules execute in ascending `priority`. When priorities tie, lexical order defines precedence.
|
||||
2. **Short-circuit:** Once a rule sets `status`, subsequent rules only execute if they use `combine`. Use this sparingly to avoid ambiguity.
|
||||
3. **Actions:**
|
||||
- `status := <string>` – Allowed values: `affected`, `not_affected`, `fixed`, `suppressed`, `under_investigation`, `escalated`.
|
||||
@@ -271,30 +271,30 @@ rule vex_strong_claim priority 5 {
|
||||
}
|
||||
```
|
||||
|
||||
### 9.3 Environment-Specific Escalation
|
||||
### 9.3 Environment-Specific Escalation
|
||||
|
||||
```dsl
|
||||
rule internet_exposed_guard {
|
||||
when env.exposure == "internet"
|
||||
and severity.normalized >= "High"
|
||||
then escalate to severity_band("Critical")
|
||||
because "Internet-exposed assets require critical posture";
|
||||
}
|
||||
```
|
||||
|
||||
### 9.4 Shadow mode & coverage
|
||||
|
||||
- Enable `settings { shadow = true; }` for new policies or major changes. Findings are recorded but not enforced.
|
||||
- Provide coverage fixtures under `tests/policy/<policyId>/cases/*.json`; run `stella policy test` locally and in CI. Coverage results must be attached on submission.
|
||||
- Promotion to active is blocked until shadow runs + coverage gates pass (see lifecycle §3).
|
||||
|
||||
### 9.5 Authoring workflow (quick checklist)
|
||||
|
||||
1. Write/update policy with shadow enabled.
|
||||
2. Add/refresh coverage fixtures; run `stella policy test`.
|
||||
3. `stella policy lint` and `stella policy simulate --fixtures ...` with expected signals (trust_score, reachability, entropy_penalty) noted in comments.
|
||||
4. Submit with attachments: lint, simulate diff, coverage results.
|
||||
5. After approval, disable shadow and promote; retain fixtures for regression tests.
|
||||
rule internet_exposed_guard {
|
||||
when env.exposure == "internet"
|
||||
and severity.normalized >= "High"
|
||||
then escalate to severity_band("Critical")
|
||||
because "Internet-exposed assets require critical posture";
|
||||
}
|
||||
```
|
||||
|
||||
### 9.4 Shadow mode & coverage
|
||||
|
||||
- Enable `settings { shadow = true; }` for new policies or major changes. Findings are recorded but not enforced.
|
||||
- Provide coverage fixtures under `tests/policy/<policyId>/cases/*.json`; run `stella policy test` locally and in CI. Coverage results must be attached on submission.
|
||||
- Promotion to active is blocked until shadow runs + coverage gates pass (see lifecycle §3).
|
||||
|
||||
### 9.5 Authoring workflow (quick checklist)
|
||||
|
||||
1. Write/update policy with shadow enabled.
|
||||
2. Add/refresh coverage fixtures; run `stella policy test`.
|
||||
3. `stella policy lint` and `stella policy simulate --fixtures ...` with expected signals (trust_score, reachability, entropy_penalty) noted in comments.
|
||||
4. Submit with attachments: lint, simulate diff, coverage results.
|
||||
5. After approval, disable shadow and promote; retain fixtures for regression tests.
|
||||
|
||||
### 9.4 Anti-pattern (flagged by linter)
|
||||
|
||||
@@ -332,7 +332,42 @@ rule catch_all {
|
||||
|
||||
---
|
||||
|
||||
## 12 · Versioning & Compatibility
|
||||
## 12 · Uncertainty Gates (U1/U2/U3)
|
||||
|
||||
Uncertainty gates enforce evidence-quality thresholds before allowing high-confidence VEX decisions. When entropy is too high or evidence is missing, policies should downgrade to \ rather than risk false negatives.
|
||||
|
||||
### 12.1 Gate Types
|
||||
|
||||
| Gate | Tier Threshold | Blocks | Allows | Remediation |
|
||||
|------|---------------|--------|--------|-------------|
|
||||
| \ | T1 (\) | \ | \, \ | Upload symbols, resolve unknowns |
|
||||
| \ | T2 (\) | \ (warns) | \ with review flag | Populate lockfiles, fix purl resolution |
|
||||
| \ | T3 (\) | None (advisory only) | All with caveat | Corroborate advisory, add trusted source |
|
||||
|
||||
### 12.2 Uncertainty Gate Rules
|
||||
|
||||
### 12.3 Tier-Aware Compound Rules
|
||||
|
||||
Combine uncertainty tiers with reachability states for nuanced gating:
|
||||
|
||||
### 12.4 Remediation Actions
|
||||
|
||||
Policy rules should guide users toward reducing uncertainty:
|
||||
|
||||
| Uncertainty State | Remediation Action | Policy Annotation |
|
||||
|-------------------|-------------------|-------------------|
|
||||
| \ (MissingSymbolResolution) | Upload debug symbols, run \ | \ |
|
||||
| \ (MissingPurl) | Generate lockfiles, verify package coordinates | \ |
|
||||
| \ (UntrustedAdvisory) | Cross-reference trusted sources, wait for corroboration | \ |
|
||||
| \ (Unknown) | Run initial analysis, enable probes | \ |
|
||||
|
||||
### 12.5 YAML Configuration for Gate Thresholds
|
||||
|
||||
The Policy Engine reads uncertainty gate thresholds from configuration:
|
||||
|
||||
---
|
||||
|
||||
## 13 · Versioning & Compatibility
|
||||
|
||||
- `syntax "stella-dsl@1"` is mandatory.
|
||||
- Future revisions (`@2`, …) will be additive; existing packs continue to compile with their declared version.
|
||||
@@ -340,7 +375,7 @@ rule catch_all {
|
||||
|
||||
---
|
||||
|
||||
## 13 · Compliance Checklist
|
||||
## 14 · Compliance Checklist
|
||||
|
||||
- [ ] **Grammar validated:** Policy compiles with `stella policy lint` and matches `syntax "stella-dsl@1"`.
|
||||
- [ ] **Deterministic constructs only:** No use of forbidden namespaces (`DateTime.Now`, `Guid.NewGuid`, external services).
|
||||
@@ -351,4 +386,4 @@ rule catch_all {
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-26 (Sprint 0401).*
|
||||
*Last updated: 2025-12-13 (Sprint 0401).*
|
||||
|
||||
461
docs/reachability/binary-reachability-schema.md
Normal file
461
docs/reachability/binary-reachability-schema.md
Normal file
@@ -0,0 +1,461 @@
|
||||
# Binary Reachability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Attestor Guild._
|
||||
|
||||
This document defines the binary reachability schema addressing gaps BR1-BR10 from the November 2025 product findings. It specifies DSSE predicate formats, edge hash recipes, binary evidence requirements, build-id handling, and Sigstore integration.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Binary reachability extends the function-level evidence chain to native executables (ELF, PE, Mach-O). Key challenges addressed:
|
||||
|
||||
- **Stripped binaries:** Symbol recovery using `code_id` + `code_block_hash`
|
||||
- **Build variants:** Handling multiple builds from same source
|
||||
- **Large graphs:** Chunking and size limits for DSSE/Rekor
|
||||
- **Offline verification:** Air-gapped attestation workflows
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### BR1: Canonical DSSE/Predicate Schemas
|
||||
|
||||
**Binary graph predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryGraph@v1
|
||||
```
|
||||
|
||||
**Predicate schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryGraph/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "graph",
|
||||
"digest": {"blake3": "a1b2c3d4e5f6..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"analyzer": {
|
||||
"name": "scanner.native",
|
||||
"version": "1.2.0",
|
||||
"toolchain": "ghidra-11.2"
|
||||
},
|
||||
"binary": {
|
||||
"format": "ELF",
|
||||
"arch": "x86_64",
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c..."
|
||||
},
|
||||
"graph_stats": {
|
||||
"node_count": 1247,
|
||||
"edge_count": 3891,
|
||||
"root_count": 5
|
||||
},
|
||||
"evidence": {
|
||||
"symbols_source": "DWARF",
|
||||
"stripped_symbols": 58,
|
||||
"heuristic_symbols": 12
|
||||
},
|
||||
"created_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Edge bundle predicate:**
|
||||
|
||||
```
|
||||
stella.ops/binaryEdgeBundle@v1
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.dev/predicates/binaryEdgeBundle/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "edges",
|
||||
"digest": {"sha256": "..."}
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"bundle_id": "bundle:001",
|
||||
"bundle_reason": "init_array",
|
||||
"edge_count": 128,
|
||||
"edges": [
|
||||
{
|
||||
"from": "sym:binary:...",
|
||||
"to": "sym:binary:...",
|
||||
"reason": "init-array",
|
||||
"confidence": 0.95
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR2: Edge Hash Recipe
|
||||
|
||||
**Binary edge hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason,
|
||||
"binary_hash": binary.file_hash // Binary context included
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Hash includes binary context:**
|
||||
|
||||
Unlike managed code edges, binary edges include `binary_hash` in the hash computation to distinguish edges from different binaries with identical symbol names.
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Keys: `binary_hash`, `from`, `kind`, `reason`, `to` (alphabetical)
|
||||
2. No whitespace, UTF-8 encoding
|
||||
3. Lowercase hex for all hashes
|
||||
|
||||
### BR3: Required Binary Evidence with CAS Refs
|
||||
|
||||
**Required evidence per node:**
|
||||
|
||||
| Evidence Type | Required | CAS Storage |
|
||||
|---------------|----------|-------------|
|
||||
| File hash | Yes | N/A (inline) |
|
||||
| Build ID | Conditional | N/A (inline) |
|
||||
| Symbol source | Yes | N/A (inline) |
|
||||
| Code block hash | For stripped | `cas://binary/blocks/{sha256}` |
|
||||
| Disassembly | Optional | `cas://binary/disasm/{sha256}` |
|
||||
| CFG | Optional | `cas://binary/cfg/{sha256}` |
|
||||
|
||||
**Evidence schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"binary_evidence": {
|
||||
"file_hash": "sha256:...",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_source": "DWARF",
|
||||
"symbol_confidence": 0.95,
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"code_block_uri": "cas://binary/blocks/sha256:deadbeef...",
|
||||
"disassembly_uri": "cas://binary/disasm/sha256:...",
|
||||
"cfg_uri": "cas://binary/cfg/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**CAS layout:**
|
||||
|
||||
```
|
||||
cas://binary/
|
||||
blocks/{sha256}/ # Code block bytes
|
||||
disasm/{sha256}/ # Disassembly JSON
|
||||
cfg/{sha256}/ # Control flow graph
|
||||
symbols/{sha256}/ # Symbol table extract
|
||||
```
|
||||
|
||||
### BR4: Build-ID/Variant Rules
|
||||
|
||||
**Build-ID sources:**
|
||||
|
||||
| Format | Build-ID Source | Example |
|
||||
|--------|-----------------|---------|
|
||||
| ELF | `.note.gnu.build-id` | `gnu-build-id:5f0c7c3c...` |
|
||||
| PE | Debug GUID | `pe-guid:12345678-1234-...` |
|
||||
| Mach-O | `LC_UUID` | `macho-uuid:12345678...` |
|
||||
|
||||
**Fallback when build-ID absent:**
|
||||
|
||||
```json
|
||||
{
|
||||
"build_id": null,
|
||||
"build_id_fallback": {
|
||||
"method": "file_hash",
|
||||
"value": "sha256:...",
|
||||
"confidence": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant handling:**
|
||||
|
||||
Multiple binaries from same source (debug/release, different arch):
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_group": "sha256:source_hash...",
|
||||
"variants": [
|
||||
{"build_id": "gnu-build-id:aaa...", "variant_type": "release-x86_64"},
|
||||
{"build_id": "gnu-build-id:bbb...", "variant_type": "debug-x86_64"},
|
||||
{"build_id": "gnu-build-id:ccc...", "variant_type": "release-aarch64"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### BR5: Policy Hash Governance
|
||||
|
||||
**Policy version binding:**
|
||||
|
||||
Binary reachability graphs are bound to a policy version:
|
||||
|
||||
```json
|
||||
{
|
||||
"policy_binding": {
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"bound_at": "2025-12-13T10:00:00Z",
|
||||
"binding_mode": "strict"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Binding modes:**
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `strict` | Graph invalid if policy changes |
|
||||
| `forward` | Graph valid with newer policy versions |
|
||||
| `any` | Graph valid with any policy version |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. Production graphs use `strict` binding
|
||||
2. Test graphs may use `forward`
|
||||
3. Policy hash computed from canonical DSL
|
||||
4. Binding stored in graph metadata
|
||||
|
||||
### BR6: Sigstore Bundle/Log Routing
|
||||
|
||||
**Sigstore integration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"bundle_type": "hashedrekord",
|
||||
"log_index": 12345678,
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"inclusion_proof": {
|
||||
"log_index": 12345678,
|
||||
"root_hash": "sha256:...",
|
||||
"tree_size": 98765432,
|
||||
"hashes": ["sha256:...", "sha256:..."]
|
||||
},
|
||||
"signed_entry_timestamp": "base64:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Log routing:**
|
||||
|
||||
| Evidence Type | Log | Notes |
|
||||
|---------------|-----|-------|
|
||||
| Graph DSSE | Rekor (public) | Always |
|
||||
| Edge bundle DSSE | Rekor (capped) | Configurable limit |
|
||||
| Code block | No log | CAS only |
|
||||
| CFG/Disasm | No log | CAS only |
|
||||
|
||||
**Offline mode:**
|
||||
|
||||
When Rekor unavailable:
|
||||
|
||||
```json
|
||||
{
|
||||
"sigstore": {
|
||||
"mode": "offline",
|
||||
"checkpoint": {
|
||||
"origin": "rekor.sigstore.dev",
|
||||
"checkpoint_data": "base64:...",
|
||||
"captured_at": "2025-12-13T10:00:00Z"
|
||||
},
|
||||
"deferred_submission": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR7: Idempotent Submission Keys
|
||||
|
||||
**Submission key format:**
|
||||
|
||||
```
|
||||
submit:{tenant}:{binary_hash}:{graph_hash}:{timestamp_hour}
|
||||
```
|
||||
|
||||
**Idempotency rules:**
|
||||
|
||||
1. Same key returns existing entry (no duplicate)
|
||||
2. Key includes hour-granularity timestamp for rate limiting
|
||||
3. Different graphs from same binary produce different keys
|
||||
4. Retry within 1 hour uses same key
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```json
|
||||
{
|
||||
"submission": {
|
||||
"key": "submit:acme:sha256:abc...:blake3:def...:2025121310",
|
||||
"status": "accepted",
|
||||
"existing_entry": false,
|
||||
"log_index": 12345678
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### BR8: Size/Chunking Limits
|
||||
|
||||
**Size limits:**
|
||||
|
||||
| Element | Limit | Action on Exceed |
|
||||
|---------|-------|------------------|
|
||||
| Graph JSON | 10 MB | Chunk nodes/edges |
|
||||
| Edge bundle | 512 edges | Split bundles |
|
||||
| DSSE payload | 1 MB | Compress/chunk |
|
||||
| Rekor entry | 100 KB | Reference CAS |
|
||||
|
||||
**Chunking strategy:**
|
||||
|
||||
For large graphs (>10MB):
|
||||
|
||||
```json
|
||||
{
|
||||
"chunked_graph": {
|
||||
"chunk_count": 5,
|
||||
"chunks": [
|
||||
{"chunk_id": "chunk:001", "uri": "cas://graphs/chunks/001", "hash": "blake3:..."},
|
||||
{"chunk_id": "chunk:002", "uri": "cas://graphs/chunks/002", "hash": "blake3:..."}
|
||||
],
|
||||
"assembly_order": ["chunk:001", "chunk:002", ...],
|
||||
"assembled_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Compression:**
|
||||
|
||||
- Graph JSON: gzip before DSSE
|
||||
- CAS storage: Raw JSON (indexed)
|
||||
- Rekor payload: DSSE references CAS
|
||||
|
||||
### BR9: API/CLI/UI Surfacing
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `POST` | `/api/binary/graphs` | Submit binary graph |
|
||||
| `GET` | `/api/binary/graphs/{hash}` | Get graph details |
|
||||
| `GET` | `/api/binary/graphs/{hash}/edges` | List edges |
|
||||
| `GET` | `/api/binary/symbols/{symbolId}` | Get symbol details |
|
||||
| `POST` | `/api/binary/verify` | Verify graph attestation |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# Submit binary graph
|
||||
stella binary submit --graph ./richgraph.json --binary ./app
|
||||
|
||||
# Get graph info
|
||||
stella binary info --hash blake3:a1b2c3d4...
|
||||
|
||||
# List symbols
|
||||
stella binary symbols --hash blake3:... --stripped-only
|
||||
|
||||
# Verify attestation
|
||||
stella binary verify --graph ./richgraph.json --dsse ./richgraph.dsse
|
||||
```
|
||||
|
||||
**UI components:**
|
||||
|
||||
- Binary graph visualization with zoom/pan
|
||||
- Symbol table with search/filter
|
||||
- Edge explorer with confidence highlighting
|
||||
- Attestation status badges
|
||||
- Build variant selector
|
||||
|
||||
### BR10: Binary Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Binary/
|
||||
fixtures/
|
||||
elf-x86_64-with-debug/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
elf-stripped/
|
||||
binary.elf
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
pe-x64-with-pdb/
|
||||
binary.exe
|
||||
graph.json
|
||||
expected-hashes.txt
|
||||
golden/
|
||||
elf-x86_64.golden.json
|
||||
pe-x64.golden.json
|
||||
|
||||
datasets/binary/
|
||||
schema/
|
||||
binary-graph.schema.json
|
||||
binary-edge.schema.json
|
||||
samples/
|
||||
openssl-1.1.1/
|
||||
libssl.so
|
||||
graph.json
|
||||
edges.ndjson
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each binary format has at least one fixture
|
||||
2. Stripped and debug variants for each format
|
||||
3. Expected hashes verified by CI
|
||||
4. Golden outputs include DSSE envelopes
|
||||
5. Fixtures reproducible from source (where legal)
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Hash stability:** Same binary produces same graph hash
|
||||
2. **Build-ID extraction:** Correct build-ID parsing per format
|
||||
3. **Symbol recovery:** DWARF/PDB parsing accuracy
|
||||
4. **Stripped handling:** Code block hash computation
|
||||
5. **Chunking:** Large graph assembly/disassembly
|
||||
6. **DSSE signing:** Envelope creation and verification
|
||||
7. **Rekor integration:** Submission and verification
|
||||
|
||||
---
|
||||
|
||||
## 3. Implementation Status
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| ELF parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| PE parser | `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native` | Implemented |
|
||||
| DSSE predicates | `src/Signer/StellaOps.Signer/PredicateTypes.cs` | Implemented |
|
||||
| CAS storage | `src/Scanner/__Libraries/StellaOps.Scanner.Reachability` | Partial |
|
||||
| Rekor integration | `src/Attestor/StellaOps.Attestor` | Implemented |
|
||||
| CLI commands | `src/Cli/StellaOps.Cli` | Planned |
|
||||
| UI components | `src/UI/StellaOps.UI` | Planned |
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Edge Explainability](./edge-explainability-schema.md) - Edge reason codes
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Native Analyzer Tests](../../src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests/Reachability/) - Test fixtures
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 BINARY-GAPS-401-066 for change history._
|
||||
@@ -1,45 +1,69 @@
|
||||
# Reachability Corpus Plan (QA-CORPUS-401-031)
|
||||
|
||||
Objective
|
||||
- Build a multi-runtime reachability corpus (Go/.NET/Python/Rust) with EXPECT.yaml ground truths and captured traces.
|
||||
- Make fixtures CI-consumable to validate reachability scoring and VEX proofs continuously.
|
||||
- Add public mini-dataset cases (PHP/JavaScript/C#) from advisory 23-Nov-2025 for ingestion/bench reuse.
|
||||
- Maintain deterministic, offline reachability fixtures that validate callgraph ingestion, reachability truth-path handling, and VEX proof workflows.
|
||||
- Keep the corpus small but multi-runtime (Go/.NET/Python/Rust), and keep a public-friendly mini dataset (PHP/JavaScript/C#) for docs/demos without external repos.
|
||||
|
||||
Scope & deliverables
|
||||
- Fixture layout: `tests/reachability/corpus/<language>/<case>/`
|
||||
- `expect.yaml` — states (`reachable|conditional|unreachable`), score, evidence refs.
|
||||
- `callgraph.*.json` — static graphs per language.
|
||||
- `runtime/*.ndjson` — traces/probes when available.
|
||||
- `sbom.*.json` — CycloneDX/SPDX slices.
|
||||
- `vex.openvex.json` — expected VEX statement.
|
||||
- CI integration: add corpus harness to `tests/reachability/StellaOps.Reachability.FixtureTests` to validate presence, schema, and determinism (hash manifest).
|
||||
- Offline posture: all artifacts deterministic, no external downloads; hashes recorded in manifest.
|
||||
- Public mini-dataset layout (PHP/JS/C#) to be mirrored under `tests/reachability/samples-public/`:
|
||||
```
|
||||
vuln-reach-dataset/
|
||||
schema/ground-truth.schema.json
|
||||
runners/run_all.sh
|
||||
samples/
|
||||
php/php-001-phar-deserialize/...
|
||||
js/js-002-yaml-unsafe-load/...
|
||||
csharp/cs-001-binaryformatter-deserialize/...
|
||||
```
|
||||
Each sample ships: minimal app, lockfile, SBOM (CycloneDX JSON), VEX, ground truth (EXPECT/JSON), repro script.
|
||||
## Corpus Map
|
||||
|
||||
MVP slice (proposed)
|
||||
### 1) Multi-runtime corpus (internal MVP)
|
||||
|
||||
Path: `tests/reachability/corpus/`
|
||||
|
||||
Per-case layout: `tests/reachability/corpus/<language>/<case>/`
|
||||
- `callgraph.static.json` — static call graph sample (stub for MVP).
|
||||
- `ground-truth.json` — expected reachability outcome and example path(s) (Reachbench truth schema v1; `schema_version=reachbench.reachgraph.truth/v1`).
|
||||
- `vex.openvex.json` — expected VEX slice for the case.
|
||||
- Optional (future): `runtime/*.ndjson`, `sbom.*.json`
|
||||
|
||||
`tests/reachability/corpus/manifest.json` records deterministic SHA-256 hashes for required files in each case directory.
|
||||
|
||||
### 2) Public mini dataset (PHP/JS/C#)
|
||||
|
||||
Path: `tests/reachability/samples-public/`
|
||||
|
||||
Layout:
|
||||
- `schema/ground-truth.schema.json` — JSON schema for `ground-truth.json` (Reachbench truth schema v1).
|
||||
- `manifest.json` — deterministic SHA-256 hashes for required files in each sample directory.
|
||||
- `samples/<lang>/<case-id>/` — per-sample artifacts: `callgraph.static.json`, `ground-truth.json`, `sbom.cdx.json`, `vex.openvex.json`, `repro.sh`.
|
||||
- `runners/run_all.{sh,ps1}` — deterministic manifest regeneration.
|
||||
|
||||
### 3) Reachbench fixture pack (expanded, dual variants)
|
||||
|
||||
Path: `tests/reachability/fixtures/reachbench-2025-expanded/`
|
||||
|
||||
Each case has two variants (reachable/unreachable) with per-variant `manifest.json` and `reachgraph.truth.json`. Fixture integrity is validated by `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
|
||||
## Ground Truth Conventions
|
||||
|
||||
- Corpus and public samples use the same truth schema (`reachbench.reachgraph.truth/v1`) but differ in file naming (`ground-truth.json` vs reachbench pack `reachgraph.truth.json`).
|
||||
- Legacy corpus `expect.yaml` has been retired; prior `state/score` values are preserved under `legacy_expect` in `ground-truth.json`.
|
||||
- Legacy `conditional` states are represented as `variant=unreachable` plus `legacy_expect.state=conditional` until the truth schema grows a dedicated conditional/contested variant.
|
||||
|
||||
## Determinism & Runners
|
||||
|
||||
Regenerate all reachability manifests (corpus + public samples + reachbench pack):
|
||||
- `tests/reachability/runners/run_all.sh`
|
||||
- `tests/reachability/runners/run_all.ps1`
|
||||
|
||||
Individual scripts:
|
||||
- `python tests/reachability/scripts/update_corpus_manifest.py`
|
||||
- `python tests/reachability/samples-public/scripts/update_manifest.py`
|
||||
- `python tests/reachability/fixtures/reachbench-2025-expanded/harness/update_variant_manifests.py`
|
||||
|
||||
## CI Gates
|
||||
|
||||
- `tests/reachability/StellaOps.Reachability.FixtureTests`
|
||||
- validates presence + hashes from manifests for corpus/public samples/reachbench fixtures
|
||||
- enforces minimum language-bucket coverage (Go/.NET/Python/Rust + PHP/JS/C#)
|
||||
|
||||
## MVP Slice (stub cases)
|
||||
- Go: `go-ssh-CVE-2020-9283-keyexchange`
|
||||
- .NET: `dotnet-kestrel-CVE-2023-44487-http2-rapid-reset`
|
||||
- Python: `python-django-CVE-2019-19844-sqli-like`
|
||||
- Rust: `rust-axum-header-parsing-TBD`
|
||||
|
||||
Work plan
|
||||
1) Define shared manifest schema + hash manifest (NDJSON) under `tests/reachability/corpus/manifest.json`.
|
||||
2) For each MVP case, add minimal static callgraph + EXPECT.yaml with score/state and evidence links. (DONE: stub versions committed)
|
||||
3) Extend reachability fixture tests to cover corpus folders (presence, hashes, EXPECT.yaml schema). (DONE)
|
||||
4) Wire CI job to run the extended tests in `tests/reachability/StellaOps.Reachability.FixtureTests`. (TODO)
|
||||
5) Replace stubs with real callgraphs/traces and expand corpus after MVP passes CI. (TODO)
|
||||
## Next Work (post-MVP)
|
||||
- Wire a CI job to run `tests/reachability/StellaOps.Reachability.FixtureTests`.
|
||||
- Replace stubs with real callgraphs/traces and expand the corpus once CI is stable.
|
||||
|
||||
Determinism rules
|
||||
- Sort JSON keys; round scores to 2dp; UTC times only if needed.
|
||||
- Stable ordering of files in manifests; hash with SHA-256.
|
||||
- No network calls during test or generation.
|
||||
|
||||
416
docs/reachability/edge-explainability-schema.md
Normal file
416
docs/reachability/edge-explainability-schema.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Edge Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Scanner Guild + Policy Guild._
|
||||
|
||||
This document defines the edge explainability schema addressing gaps EG1-EG10 from the November 2025 product findings. It specifies the canonical format for call edge evidence, reason codes, confidence rubrics, and propagation into explanation graphs and VEX.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Edge explainability provides detailed rationale for each call edge in the reachability graph. Every edge includes:
|
||||
|
||||
- **Reason code:** Why this edge was detected (e.g., `bytecode-invoke`, `plt-stub`, `indirect-target`)
|
||||
- **Confidence score:** Certainty of the edge's existence
|
||||
- **Evidence sources:** Detectors and rules that contributed to edge discovery
|
||||
- **Provenance:** Analyzer version, detection timestamp, and input artifacts
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EG1: Reason Enum Governance
|
||||
|
||||
**Standard reason codes:**
|
||||
|
||||
| Code | Category | Description | Example |
|
||||
|------|----------|-------------|---------|
|
||||
| `bytecode-invoke` | Static | Bytecode invocation instruction | Java `invokevirtual`, .NET `call` |
|
||||
| `bytecode-field` | Static | Field access leading to call | Static initializer |
|
||||
| `import-symbol` | Static | Import table reference | ELF `.dynsym`, PE imports |
|
||||
| `plt-stub` | Static | PLT/GOT indirection | `printf@plt` |
|
||||
| `reloc-target` | Static | Relocation target | `.rela.dyn` entries |
|
||||
| `indirect-target` | Heuristic | Indirect call target analysis | CFG-based |
|
||||
| `init-array` | Static | Constructor/initializer array | `.init_array`, `DT_INIT` |
|
||||
| `fini-array` | Static | Destructor/finalizer array | `.fini_array`, `DT_FINI` |
|
||||
| `vtable-slot` | Heuristic | Virtual method dispatch | C++ vtable |
|
||||
| `reflection-invoke` | Heuristic | Reflective method invocation | `Method.invoke()` |
|
||||
| `runtime-observed` | Runtime | Runtime probe observation | JFR, eBPF |
|
||||
| `user-annotated` | Manual | User-provided edge | Policy override |
|
||||
|
||||
**Governance rules:**
|
||||
|
||||
1. New reason codes require RFC + review by Scanner Guild
|
||||
2. Deprecated codes remain valid for 2 major versions
|
||||
3. Custom codes use `custom:` prefix (e.g., `custom:my-analyzer`)
|
||||
4. Codes are case-insensitive, normalized to lowercase
|
||||
|
||||
**Code registry:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.edge.reason.registry@v1",
|
||||
"version": "2025-12-13",
|
||||
"reasons": [
|
||||
{
|
||||
"code": "bytecode-invoke",
|
||||
"category": "static",
|
||||
"description": "Bytecode invocation instruction",
|
||||
"languages": ["java", "dotnet"],
|
||||
"confidence_range": [0.9, 1.0],
|
||||
"deprecated": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### EG2: Canonical Edge Schema with Hash Rules
|
||||
|
||||
**Edge schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"edge_id": "edge:sha256:{hex}",
|
||||
"from": "sym:java:...",
|
||||
"to": "sym:java:...",
|
||||
"kind": "call",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95,
|
||||
"evidence": [
|
||||
{
|
||||
"source": "detector:java-bytecode-analyzer",
|
||||
"rule_id": "invoke-virtual",
|
||||
"rule_version": "1.0.0",
|
||||
"location": {
|
||||
"file": "com/example/Foo.class",
|
||||
"offset": 1234,
|
||||
"instruction": "invokevirtual #42"
|
||||
},
|
||||
"timestamp": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"virtual": true,
|
||||
"polymorphic_targets": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Hash computation:**
|
||||
|
||||
```
|
||||
edge_id = "edge:" + sha256(
|
||||
canonical_json({
|
||||
"from": edge.from,
|
||||
"to": edge.to,
|
||||
"kind": edge.kind,
|
||||
"reason": edge.reason
|
||||
})
|
||||
)
|
||||
```
|
||||
|
||||
**Canonicalization:**
|
||||
|
||||
1. Use only `from`, `to`, `kind`, `reason` for hash (not confidence or evidence)
|
||||
2. Sort JSON keys alphabetically
|
||||
3. No whitespace, UTF-8 encoding
|
||||
4. Hash is lowercase hex with `sha256:` prefix
|
||||
|
||||
### EG3: Evidence Limits/Redaction
|
||||
|
||||
**Evidence limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Evidence entries per edge | 10 | Yes |
|
||||
| Location detail fields | 5 | Yes |
|
||||
| Instruction preview length | 100 chars | Yes |
|
||||
| File path depth | 10 segments | No |
|
||||
|
||||
**Redaction rules:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| File paths | Normalize | `/home/user/...` -> `{PROJECT}/...` |
|
||||
| Bytecode offsets | Keep | Offsets are not PII |
|
||||
| Instruction text | Truncate | First 100 chars |
|
||||
| Source line content | Omit | Not included by default |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence_truncated": true,
|
||||
"evidence_count": 15,
|
||||
"evidence_shown": 10,
|
||||
"full_evidence_uri": "cas://edges/evidence/sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### EG4: Confidence Rubric
|
||||
|
||||
**Confidence scale:**
|
||||
|
||||
| Level | Range | Description | Typical Sources |
|
||||
|-------|-------|-------------|-----------------|
|
||||
| `certain` | 1.0 | Definite edge | Direct bytecode invoke |
|
||||
| `high` | 0.85-0.99 | Very likely | Import table, PLT |
|
||||
| `medium` | 0.5-0.84 | Probable | Indirect analysis, vtable |
|
||||
| `low` | 0.2-0.49 | Possible | Heuristic carving |
|
||||
| `unknown` | 0.0-0.19 | Speculative | User annotation, fallback |
|
||||
|
||||
**Confidence computation:**
|
||||
|
||||
```
|
||||
edge.confidence = base_confidence(reason) * evidence_boost(evidence_count) * target_resolution_factor
|
||||
```
|
||||
|
||||
**Base confidence by reason:**
|
||||
|
||||
| Reason | Base Confidence |
|
||||
|--------|-----------------|
|
||||
| `bytecode-invoke` | 0.98 |
|
||||
| `import-symbol` | 0.95 |
|
||||
| `plt-stub` | 0.92 |
|
||||
| `reloc-target` | 0.90 |
|
||||
| `init-array` | 0.95 |
|
||||
| `vtable-slot` | 0.75 |
|
||||
| `indirect-target` | 0.60 |
|
||||
| `reflection-invoke` | 0.50 |
|
||||
| `runtime-observed` | 0.99 |
|
||||
| `user-annotated` | 0.80 |
|
||||
|
||||
### EG5: Detector/Rule Provenance
|
||||
|
||||
**Provenance schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"provenance": {
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"detector": {
|
||||
"name": "java-bytecode-analyzer",
|
||||
"version": "2.0.0",
|
||||
"rule_set": "default"
|
||||
},
|
||||
"rule": {
|
||||
"id": "invoke-virtual",
|
||||
"version": "1.0.0",
|
||||
"description": "Detect invokevirtual bytecode instructions"
|
||||
},
|
||||
"input_artifacts": [
|
||||
{"type": "jar", "digest": "sha256:...", "path": "lib/app.jar"}
|
||||
],
|
||||
"detected_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Provenance requirements:**
|
||||
|
||||
1. All edges must include analyzer provenance
|
||||
2. Detector/rule provenance required for non-runtime edges
|
||||
3. Input artifact digests enable reproducibility
|
||||
4. Detection timestamp uses UTC ISO-8601
|
||||
|
||||
### EG6: API/CLI Parity
|
||||
|
||||
**API endpoints:**
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/api/edges/{edgeId}` | Get edge details |
|
||||
| `GET` | `/api/edges?graph_hash=...` | List edges for graph |
|
||||
| `GET` | `/api/edges/{edgeId}/evidence` | Get full evidence |
|
||||
| `POST` | `/api/edges/search` | Search edges by criteria |
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List edges for a graph
|
||||
stella edge list --graph blake3:a1b2c3d4...
|
||||
|
||||
# Get edge details
|
||||
stella edge show --id edge:sha256:...
|
||||
|
||||
# Search edges
|
||||
stella edge search --from "sym:java:..." --reason bytecode-invoke
|
||||
|
||||
# Export edges
|
||||
stella edge export --graph blake3:... --output ./edges.ndjson
|
||||
```
|
||||
|
||||
**Output parity:**
|
||||
|
||||
- API and CLI return identical JSON structure
|
||||
- CLI supports `--json` for machine-readable output
|
||||
- Both support filtering by reason, confidence, from/to
|
||||
|
||||
### EG7: Deterministic Fixtures
|
||||
|
||||
**Fixture location:**
|
||||
|
||||
```
|
||||
tests/Edge/
|
||||
fixtures/
|
||||
bytecode-invoke.json
|
||||
plt-stub.json
|
||||
vtable-dispatch.json
|
||||
init-array-constructor.json
|
||||
runtime-observed.json
|
||||
golden/
|
||||
bytecode-invoke.golden.json
|
||||
graph-with-edges.golden.json
|
||||
|
||||
datasets/edges/
|
||||
schema/
|
||||
edge.schema.json
|
||||
reason-registry.json
|
||||
samples/
|
||||
java-spring-boot/
|
||||
edges.ndjson
|
||||
expected-hashes.txt
|
||||
```
|
||||
|
||||
**Fixture requirements:**
|
||||
|
||||
1. Each reason code has at least one fixture
|
||||
2. Fixtures include expected `edge_id` hash
|
||||
3. Golden outputs frozen after review
|
||||
4. CI verifies hash stability
|
||||
|
||||
### EG8: Propagation into Explanation Graphs/VEX
|
||||
|
||||
**Explanation graph inclusion:**
|
||||
|
||||
```json
|
||||
{
|
||||
"explanation": {
|
||||
"path": [
|
||||
{
|
||||
"node": "sym:java:main...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:handler...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.98
|
||||
}
|
||||
},
|
||||
{
|
||||
"node": "sym:java:handler...",
|
||||
"outgoing_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"to": "sym:java:log4j...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
}
|
||||
}
|
||||
],
|
||||
"aggregate_path_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**VEX evidence format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"stellaops:reachability": {
|
||||
"path_edges": [
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.98},
|
||||
{"edge_id": "edge:sha256:...", "reason": "bytecode-invoke", "confidence": 0.95}
|
||||
],
|
||||
"weakest_edge": {
|
||||
"edge_id": "edge:sha256:...",
|
||||
"reason": "bytecode-invoke",
|
||||
"confidence": 0.95
|
||||
},
|
||||
"aggregate_confidence": 0.93
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EG9: Localization Guidance
|
||||
|
||||
**Localizable elements:**
|
||||
|
||||
| Element | Localization | Example |
|
||||
|---------|--------------|---------|
|
||||
| Reason code display | Message catalog | `bytecode-invoke` -> "Bytecode method call" |
|
||||
| Confidence level | Message catalog | `high` -> "High confidence" |
|
||||
| Evidence descriptions | Template | "Detected at offset {offset} in {file}" |
|
||||
| Error messages | Message catalog | Standard error codes |
|
||||
|
||||
**Message catalog structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"locale": "en-US",
|
||||
"messages": {
|
||||
"edge.reason.bytecode-invoke": "Bytecode method call",
|
||||
"edge.reason.plt-stub": "PLT/GOT library call",
|
||||
"edge.confidence.high": "High confidence ({0:P0})",
|
||||
"edge.evidence.location": "Detected at offset {offset} in {file}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Supported locales:**
|
||||
|
||||
- `en-US` (default)
|
||||
- Additional locales via contribution
|
||||
|
||||
### EG10: Backfill Plan
|
||||
|
||||
**Backfill strategy:**
|
||||
|
||||
1. **Phase 1:** Add reason codes to new edges (no backfill needed)
|
||||
2. **Phase 2:** Run detector upgrade on graphs without reason codes
|
||||
3. **Phase 3:** Mark old graphs as `requires_reanalysis` in metadata
|
||||
|
||||
**Migration script:**
|
||||
|
||||
```bash
|
||||
stella edge backfill --graph blake3:... --dry-run
|
||||
|
||||
# Output:
|
||||
Graph: blake3:a1b2c3d4...
|
||||
Edges without reason: 1234
|
||||
Edges to update: 1234
|
||||
|
||||
Dry run - no changes made.
|
||||
|
||||
# Execute:
|
||||
stella edge backfill --graph blake3:... --execute
|
||||
```
|
||||
|
||||
**Backfill metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"backfill": {
|
||||
"status": "complete",
|
||||
"original_analyzer_version": "1.0.0",
|
||||
"backfill_analyzer_version": "1.2.0",
|
||||
"backfilled_at": "2025-12-13T10:00:00Z",
|
||||
"edges_updated": 1234
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Explainability Schema](./explainability-schema.md) - Explanation format
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Edge bundle DSSE
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EDGE-GAPS-401-065 for change history._
|
||||
454
docs/reachability/explainability-schema.md
Normal file
454
docs/reachability/explainability-schema.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# Explainability Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Policy Guild + Docs Guild._
|
||||
|
||||
This document defines the explainability schema addressing gaps EX1-EX10 from the November 2025 product findings. It specifies the canonical format for vulnerability verdict explanations, DSSE signing policy, CAS storage rules, and export/replay formats.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Explainability provides auditable, machine-readable rationale for every vulnerability verdict. Each explanation includes:
|
||||
|
||||
- **Decision chain:** Ordered list of rules/policies that contributed to the verdict
|
||||
- **Evidence links:** References to graphs, runtime facts, VEX statements, and SBOM components
|
||||
- **Confidence scores:** Per-rule and aggregate confidence values
|
||||
- **Redaction metadata:** PII handling and data classification
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### EX1: Schema/Canonicalization + Hashes
|
||||
|
||||
**Explanation schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation@v1",
|
||||
"explanation_id": "explain:sha256:{hex}",
|
||||
"finding_id": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"verdict": {
|
||||
"status": "affected",
|
||||
"severity": {"normalized": "Critical", "score": 10.0},
|
||||
"confidence": 0.92
|
||||
},
|
||||
"decision_chain": [
|
||||
{
|
||||
"rule_id": "rule:reachability_gate",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"reachability.state": "CR",
|
||||
"reachability.confidence": 0.92
|
||||
},
|
||||
"output": {"allowed": true, "contribution": 0.4},
|
||||
"evidence_refs": ["cas://reachability/graphs/blake3:..."]
|
||||
},
|
||||
{
|
||||
"rule_id": "rule:severity_baseline",
|
||||
"rule_version": "1.0.0",
|
||||
"inputs": {
|
||||
"cvss_base": 10.0,
|
||||
"epss_percentile": 0.95
|
||||
},
|
||||
"output": {"severity": "Critical", "contribution": 0.6},
|
||||
"evidence_refs": ["cas://advisories/CVE-2021-44228.json"]
|
||||
}
|
||||
],
|
||||
"aggregate_confidence": 0.88,
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"policy_version": "sha256:...",
|
||||
"graph_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
**Canonicalization rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all levels
|
||||
2. Arrays in `decision_chain` ordered by rule execution sequence
|
||||
3. `evidence_refs` arrays sorted alphabetically
|
||||
4. No whitespace, UTF-8 encoding
|
||||
5. Hash computed over canonical JSON: `sha256(canonical_json)`
|
||||
|
||||
### EX2: DSSE Predicate/Signing Policy
|
||||
|
||||
**DSSE predicate type:**
|
||||
|
||||
```
|
||||
stella.ops/explanation@v1
|
||||
```
|
||||
|
||||
**Signing policy:**
|
||||
|
||||
| Element | Required | Signer |
|
||||
|---------|----------|--------|
|
||||
| Explanation body | Yes | Policy Engine key |
|
||||
| Graph DSSE reference | Yes (if reachability cited) | Scanner key |
|
||||
| VEX DSSE reference | Yes (if VEX cited) | Policy Engine key |
|
||||
|
||||
**DSSE envelope structure:**
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.explanation+json",
|
||||
"payload": "<base64(canonical_explanation_json)>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "policy-engine-signing-2025",
|
||||
"sig": "base64:..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signing requirements:**
|
||||
|
||||
- All explanations must be signed before CAS storage
|
||||
- Signing key must be registered in Authority key store
|
||||
- Key rotation triggers re-signing of active explanations (configurable)
|
||||
|
||||
### EX3: CAS Storage Rules for Evidence
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://explanations/
|
||||
{sha256}/ # Explanation body
|
||||
{sha256}.dsse # DSSE envelope
|
||||
by-finding/{finding_id}/ # Index by finding
|
||||
by-policy/{policy_digest}/ # Index by policy version
|
||||
by-graph/{graph_revision_id}/ # Index by graph revision
|
||||
```
|
||||
|
||||
**Storage rules:**
|
||||
|
||||
1. Explanations are immutable after signing
|
||||
2. New verdicts create new explanation documents (no updates)
|
||||
3. Previous explanations are retained per retention policy
|
||||
4. Cross-references validated at write time (graphs, VEX must exist)
|
||||
|
||||
**Deduplication:**
|
||||
|
||||
- Identical canonical JSON produces identical hash
|
||||
- CAS returns existing reference if content matches
|
||||
|
||||
### EX4: Link to Decision/Policy and graph_revision_id
|
||||
|
||||
**Required links:**
|
||||
|
||||
```json
|
||||
{
|
||||
"links": {
|
||||
"policy_version": "sha256:7e1d...",
|
||||
"policy_uri": "cas://policy/versions/sha256:7e1d...",
|
||||
"graph_revision_id": "rev:blake3:a1b2...",
|
||||
"graph_uri": "cas://reachability/revisions/blake3:a1b2...",
|
||||
"sbom_digest": "sha256:def4...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"vex_digest": "sha256:e5f6...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Validation:**
|
||||
|
||||
- All linked artifacts must exist at explanation creation time
|
||||
- Links are verified during replay/audit
|
||||
- Broken links cause replay verification failure
|
||||
|
||||
### EX5: Export/Replay Bundle Format
|
||||
|
||||
**Export bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.explanation.bundle@v1",
|
||||
"bundle_id": "bundle:explain:2025-12-13",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"explanations": [
|
||||
{
|
||||
"explanation_id": "explain:sha256:...",
|
||||
"finding_id": "...",
|
||||
"explanation_uri": "explanations/sha256:....json",
|
||||
"dsse_uri": "explanations/sha256:....dsse"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"graphs": [
|
||||
{"revision_id": "rev:blake3:...", "uri": "graphs/blake3:....json"}
|
||||
],
|
||||
"policies": [
|
||||
{"digest": "sha256:...", "uri": "policies/sha256:....json"}
|
||||
],
|
||||
"vex_statements": [
|
||||
{"digest": "sha256:...", "uri": "vex/sha256:....json"}
|
||||
]
|
||||
},
|
||||
"verification": {
|
||||
"bundle_hash": "sha256:...",
|
||||
"signature": "base64:...",
|
||||
"signed_by": "policy-engine-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Replay verification:**
|
||||
|
||||
```bash
|
||||
stella explain verify --bundle ./explanation-bundle.tgz
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:explain:2025-12-13
|
||||
Explanations: 42
|
||||
Dependencies: 5 graphs, 2 policies, 12 VEX
|
||||
|
||||
Verifying explanations...
|
||||
Canonical hashes: 42/42 MATCH
|
||||
DSSE signatures: 42/42 VALID
|
||||
Dependency links: 42/42 RESOLVED
|
||||
|
||||
Replay verification PASSED.
|
||||
```
|
||||
|
||||
### EX6: PII/Redaction Rules
|
||||
|
||||
**Redaction categories:**
|
||||
|
||||
| Category | Redaction | Example |
|
||||
|----------|-----------|---------|
|
||||
| User identifiers | Hash | `user:alice` -> `user:sha256:a1b2...` |
|
||||
| IP addresses | Mask | `192.168.1.100` -> `192.168.x.x` |
|
||||
| File paths | Normalize | `/home/alice/code/...` -> `{HOME}/code/...` |
|
||||
| Email addresses | Hash | `alice@example.com` -> `email:sha256:...` |
|
||||
| API keys/tokens | Omit | `Authorization: Bearer xxx` -> `[REDACTED]` |
|
||||
|
||||
**Redaction metadata:**
|
||||
|
||||
```json
|
||||
{
|
||||
"redaction": {
|
||||
"applied": true,
|
||||
"level": "standard",
|
||||
"fields_redacted": ["actor.email", "evidence.file_path"],
|
||||
"redaction_policy": "stellaops.redaction.standard@v1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Export modes:**
|
||||
|
||||
- `--redacted` (default): Apply standard redaction
|
||||
- `--full`: Include all data (requires `explain:export:full` scope)
|
||||
- `--audit`: Include redaction audit trail
|
||||
|
||||
### EX7: Size Budgets
|
||||
|
||||
**Limits:**
|
||||
|
||||
| Element | Default Limit | Configurable |
|
||||
|---------|--------------|--------------|
|
||||
| Explanation body | 256 KB | Yes |
|
||||
| Decision chain entries | 100 | Yes |
|
||||
| Evidence refs per rule | 20 | Yes |
|
||||
| Total evidence refs | 200 | Yes |
|
||||
| Path entries | 50 | No |
|
||||
|
||||
**Truncation behavior:**
|
||||
|
||||
When limits are exceeded:
|
||||
1. Log warning with truncation details
|
||||
2. Add `truncation` metadata to explanation
|
||||
3. Store full evidence in separate CAS object
|
||||
4. Include `full_evidence_uri` reference
|
||||
|
||||
```json
|
||||
{
|
||||
"truncation": {
|
||||
"applied": true,
|
||||
"elements_truncated": ["decision_chain", "evidence_refs"],
|
||||
"full_evidence_uri": "cas://explanations/full/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### EX8: Versioning
|
||||
|
||||
**Schema versioning:**
|
||||
|
||||
- Schema version in `schema` field: `stellaops.explanation@v1`
|
||||
- Breaking changes increment major version
|
||||
- Minor changes (additive fields) use v1.x
|
||||
- Backward compatibility maintained for 2 major versions
|
||||
|
||||
**Migration support:**
|
||||
|
||||
```bash
|
||||
stella explain migrate --from v1 --to v2 --input ./explanations/
|
||||
|
||||
# Output:
|
||||
Migrating 1000 explanations from v1 to v2...
|
||||
Migrated: 998
|
||||
Skipped (already v2): 2
|
||||
|
||||
Migration complete.
|
||||
```
|
||||
|
||||
**Version compatibility matrix:**
|
||||
|
||||
| API Version | Schema v1 | Schema v2 |
|
||||
|-------------|-----------|-----------|
|
||||
| 1.0.x | Full | N/A |
|
||||
| 1.1.x | Full | Full |
|
||||
| 2.0.x | Read-only | Full |
|
||||
|
||||
### EX9: Golden Fixtures/Tests
|
||||
|
||||
**Test fixture location:**
|
||||
|
||||
```
|
||||
tests/Explanation/
|
||||
fixtures/
|
||||
simple-affected.json
|
||||
simple-not-affected.json
|
||||
with-reachability-evidence.json
|
||||
multi-rule-chain.json
|
||||
truncated-evidence.json
|
||||
redacted-pii.json
|
||||
golden/
|
||||
simple-affected.golden.json
|
||||
simple-affected.golden.dsse
|
||||
|
||||
datasets/explanations/
|
||||
schema/
|
||||
explanation.schema.json
|
||||
samples/
|
||||
log4j-affected/
|
||||
explanation.json
|
||||
expected-hash.txt
|
||||
```
|
||||
|
||||
**Test categories:**
|
||||
|
||||
1. **Canonicalization tests:** Verify hash stability across JSON reordering
|
||||
2. **DSSE signing tests:** Verify signature creation and verification
|
||||
3. **Redaction tests:** Verify PII handling
|
||||
4. **Truncation tests:** Verify size budget enforcement
|
||||
5. **Replay tests:** Verify bundle export/import cycle
|
||||
6. **Migration tests:** Verify version upgrade paths
|
||||
|
||||
**CI integration:**
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/explanation-tests.yml
|
||||
explanation-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Run explanation tests
|
||||
run: dotnet test src/Policy/__Tests/StellaOps.Policy.Explanation.Tests
|
||||
- name: Verify golden fixtures
|
||||
run: scripts/verify-golden-fixtures.sh tests/Explanation/golden/
|
||||
```
|
||||
|
||||
### EX10: Determinism Guarantees
|
||||
|
||||
**Determinism requirements:**
|
||||
|
||||
1. Same inputs produce identical `explanation_id` hash
|
||||
2. Decision chain ordering is stable (execution order)
|
||||
3. Evidence refs sorted alphabetically
|
||||
4. Timestamps use UTC ISO-8601 with millisecond precision
|
||||
5. Floating-point values rounded to 6 decimal places
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
# Run twice with same inputs, verify identical hashes
|
||||
stella explain generate --finding "..." --output a.json
|
||||
stella explain generate --finding "..." --output b.json
|
||||
diff a.json b.json # Should be empty
|
||||
|
||||
# Or use built-in verify
|
||||
stella explain verify-determinism --finding "..." --iterations 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Generate Explanation
|
||||
|
||||
```http
|
||||
POST /api/policy/findings/{findingId}/explain
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"mode": "full",
|
||||
"include_evidence": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Explanation
|
||||
|
||||
```http
|
||||
GET /api/explanations/{explanationId}
|
||||
Authorization: Bearer <token>
|
||||
Accept: application/json
|
||||
```
|
||||
|
||||
### 3.3 Export Explanation Bundle
|
||||
|
||||
```http
|
||||
POST /api/explanations/export
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"finding_ids": ["...", "..."],
|
||||
"include_dependencies": true,
|
||||
"redaction_level": "standard"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Verify Explanation
|
||||
|
||||
```http
|
||||
POST /api/explanations/{explanationId}/verify
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. CLI Reference
|
||||
|
||||
```bash
|
||||
# Generate explanation for a finding
|
||||
stella explain generate --finding "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228"
|
||||
|
||||
# Export explanation bundle
|
||||
stella explain export --findings ./finding-ids.txt --output ./bundle.tgz
|
||||
|
||||
# Verify explanation
|
||||
stella explain verify --explanation ./explanation.json --dsse ./explanation.dsse
|
||||
|
||||
# Verify bundle
|
||||
stella explain verify --bundle ./bundle.tgz
|
||||
|
||||
# Check determinism
|
||||
stella explain verify-determinism --finding "..." --iterations 5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Related Documentation
|
||||
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [Graph Revision Schema](./graph-revision-schema.md) - Graph versioning
|
||||
- [Policy API](../api/policy.md) - Policy Engine REST API
|
||||
- [DSSE Predicates](../modules/attestor/architecture.md) - Signing specifications
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 EXPLAIN-GAPS-401-064 for change history._
|
||||
@@ -1,175 +1,535 @@
|
||||
# Function-Level Evidence Readiness (Nov 2025 Advisory)
|
||||
# Function-Level Evidence Guide
|
||||
|
||||
_Last updated: 2025-11-12. Owner: Business Analysis Guild._
|
||||
_Last updated: 2025-12-13. Owner: Docs Guild._
|
||||
|
||||
This memo captures the outstanding work required to make Stella Ops scanners emit stable, function-level evidence that matches the November 2025 advisory. It does **not** implement any code; instead it enumerates requirements, links them to sprint tasks, and spells out the schema/API updates that the next agent must land.
|
||||
This guide documents the cross-module function-level evidence chain that enables provable reachability claims. It covers the schema, identifiers, API usage, CLI commands, and integration patterns for Scanner, Signals, Policy, and Replay.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal & Scope
|
||||
## 1. Overview
|
||||
|
||||
**Goal.** Anchor every vulnerability finding to an immutable `{artifact_digest, code_id}` tuple plus optional symbol hints so replayers can prove reachability against stripped binaries.
|
||||
StellaOps implements a **function-level evidence chain** that anchors every vulnerability finding to immutable identifiers (`code_id`, `symbol_id`, `graph_hash`) enabling:
|
||||
|
||||
**Scope.** Scanner analyzers, runtime ingestion, Signals scoring, Replay manifests, Policy/VEX emission, CLI/UI explainers, and documentation/runbooks needed to operationalise the advisory.
|
||||
- **Provable reachability:** Deterministic call-path evidence from entry points to vulnerable functions.
|
||||
- **Stripped binary support:** `code_id` + `code_block_hash` provides identity when symbols are absent.
|
||||
- **Evidence replay:** Sealed artifacts with DSSE attestation allow offline verification.
|
||||
- **Cross-module linking:** Scanner -> Signals -> Policy -> VEX -> UI/CLI evidence chain.
|
||||
|
||||
Out of scope: implementing disassemblers or symbol servers; those will be handled inside the module-specific backlog tasks referenced below.
|
||||
### 1.1 Core Identifiers
|
||||
|
||||
| Identifier | Format | Purpose | Example |
|
||||
|------------|--------|---------|---------|
|
||||
| `symbol_id` | `sym:{lang}:{base64url}` | Canonical function identity | `sym:java:R3JlZXRpbmc...` |
|
||||
| `code_id` | `code:{lang}:{base64url}` | Identity for name-less code blocks | `code:binary:YWJjZGVm...` |
|
||||
| `graph_hash` | `blake3:{hex}` | Content-addressable graph identity | `blake3:a1b2c3d4e5f6...` |
|
||||
| `symbol_digest` | `sha256:{hex}` | Hash of symbol_id for edge linking | `sha256:e5f6a7b8c9d0...` |
|
||||
| `build_id` | `gnu-build-id:{hex}` | ELF/PE debug identifier | `gnu-build-id:5f0c7c3c...` |
|
||||
|
||||
### 1.2 Evidence Chain Flow
|
||||
|
||||
```
|
||||
Scanner -> richgraph-v1 -> Signals -> Scoring -> Policy -> VEX -> UI/CLI
|
||||
| | | | | | |
|
||||
| | | | | | +-- stella graph explain
|
||||
| | | | | +-- OpenVEX with call-path proofs
|
||||
| | | | +-- Policy gates + reachability.state
|
||||
| | | +-- Lattice state + confidence + riskScore
|
||||
| | +-- Runtime facts + static paths
|
||||
| +-- BLAKE3 graph_hash + DSSE attestation
|
||||
+-- code_id, symbol_id, build_id per node
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Advisory Requirements vs. System Gaps
|
||||
## 2. Schema Reference
|
||||
|
||||
| Requirement | Current gap | Task references | Notes |
|
||||
|-------------|-------------|-----------------|-------|
|
||||
| Immutable code identity (`code_id` = `{format, build_id, start, length}` + optional `code_block_hash`) | Callgraph nodes are opaque strings with no address metadata. | Sprint 401 `GRAPH-CAS-401-001`, `GAP-SCAN-001`, `GAP-SYM-007` | `code_id` should live alongside existing `SymbolID` helpers so analyzers can emit it without duplicating logic. |
|
||||
| Symbol hints (demangled name, source, confidence) | No schema fields for symbol metadata; demangling is ad-hoc per analyzer. | `GAP-SYM-007` | Require deterministic casing + `symbol.source ∈ {DWARF,PDB,SYM,none}`. |
|
||||
| Runtime facts mapped to code anchors | `/signals/runtime-facts` now accepts JSON and NDJSON (gzip) streams, stores symbol/code/process/container metadata. | Sprint 400 `ZASTAVA-REACH-201-001`, Sprint 401 `SIGNALS-RUNTIME-401-002`, `GAP-ZAS-002`, `GAP-SIG-003` | Provenance enrichment (process/socket/container) persisted; next step is exposing CAS URIs + context facts and emitting events for Policy/Replay. |
|
||||
| Replay/DSSE coverage | Replay manifests don’t enforce hash/CAS registration for graphs/traces. | Sprint 400 `REPLAY-REACH-201-005`, Sprint 401 `REPLAY-401-004`, `GAP-REP-004` | Extend manifest v2 with analyzer versions + BLAKE3 digests; add DSSE predicate types. |
|
||||
| Policy/VEX/UI explainability | Policy uses coarse `reachability:*` tags; UI/CLI cannot show call paths or evidence hashes. | Sprint 401 `POLICY-VEX-401-006`, `UI-CLI-401-007`, `GAP-POL-005`, `GAP-VEX-006`, `EXPERIENCE-GAP-401-012` | Evidence blocks must cite `code_id`, graph hash, runtime CAS URI, analyzer version. |
|
||||
| Operator documentation & samples | No guide shows how to replay `{build_id,start,len}` across CLI/API. | Sprint 401 `QA-DOCS-401-008`, `GAP-DOC-008` | Produce samples under `samples/reachability/**` plus CLI walkthroughs. |
|
||||
| Build-id propagation | Build-id not consistently captured or threaded into `SymbolID`/`code_id`; SBOM/runtime joins are brittle. | Sprint 401 `SCANNER-BUILDID-401-035` | Capture `.note.gnu.build-id`, include in code identity, expose in SBOM exports and runtime events. |
|
||||
| Load-time constructors as roots | Graph roots omit `.preinit_array`/`.init_array`/`_init`, missing load-time edges. | Sprint 401 `SCANNER-INITROOT-401-036` | Add synthetic roots with `phase=load`; include `DT_NEEDED` deps’ constructors. |
|
||||
| PURL-resolved edges | Call edges do not carry `purl` or `symbol_digest`, slowing SBOM joins. | Sprint 401 `GRAPH-PURL-401-034` | Annotate edges per `docs/reachability/purl-resolved-edges.md`; keep deterministic graph hash. |
|
||||
| Unknowns handling | Unresolved symbols/edges disappear silently. | Sprint 0400 `SIGNALS-UNKNOWN-201-008` | Emit Unknowns records (see `docs/signals/unknowns-registry.md`) and feed `unknowns_pressure` into scoring. |
|
||||
| Patch-oracle QA | No guard-rail tests proving binary analyzers see real patch deltas. | Sprint 401 `QA-PORACLE-401-037` | Add paired vuln/fixed fixtures and expectations; wire to CI using `docs/reachability/patch-oracles.md`. |
|
||||
### 2.1 SymbolID Construction
|
||||
|
||||
---
|
||||
Per-language canonical tuple format (NUL-separated, then SHA-256 -> base64url):
|
||||
|
||||
## 3. Workstreams & Expectations
|
||||
| Language | Tuple Components | Example |
|
||||
|----------|------------------|---------|
|
||||
| Java | `{package}\0{class}\0{method}\0{descriptor}` | `com.example\0Foo\0bar\0(Ljava/lang/String;)V` |
|
||||
| .NET | `{assembly}\0{namespace}\0{type}\0{member_signature}` | `MyApp\0Controllers\0UserController\0GetById(int)` |
|
||||
| Go | `{module}\0{package}\0{receiver}\0{func}` | `github.com/user/repo\0handler\0*Server\0Handle` |
|
||||
| Node | `{pkg_or_path}\0{export_path}\0{kind}` | `lodash\0get\0function` |
|
||||
| Binary | `{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}` | `sha256:abc...\0.text\00x401000\0ssl3_read\0global\0` |
|
||||
| Python | `{pkg_or_path}\0{module}\0{qualified_name}` | `requests\0api\0get` |
|
||||
| Ruby | `{gem_or_path}\0{module}\0{method}` | `rails\0ActionController::Base\0render` |
|
||||
| PHP | `{composer_pkg}\0{namespace}\0{qualified_name}` | `symfony/http-kernel\0Kernel\0handle` |
|
||||
|
||||
### 3.1 Scanner Symbolization (GAP-SCAN-001 / GAP-SYM-007)
|
||||
### 2.2 CodeID Construction
|
||||
|
||||
* Define `SymbolID` helpers that glue together `{artifact_digest, file`, optional `section`, `addr`, `length`, `code_block_hash`}.
|
||||
* Update analyzer contracts so every analyzer returns both `symbol_id` and `code_id`, with demangled names stored under the new `symbol` block.
|
||||
* Persist the data into `richgraph-v1` payloads and attach CAS URIs via `StellaOps.Scanner.Reachability`.
|
||||
* Deliver fixtures in `tests/reachability/StellaOps.ScannerSignals.IntegrationTests` that prove determinism (same hash when analyzer flags reorder).
|
||||
* **Helper status (2025-12-02):** `SymbolId.ForBinaryAddressed` + `CodeId.ForBinarySegment` now encode `{file_hash, section, addr, name, linkage, length, code_block_hash}` with normalized hex addresses. Analyzers should start emitting these tuples instead of ad-hoc hashes.
|
||||
* **Binary lifter (2025-12-03):** `BinaryReachabilityLifter` emits richgraph nodes for ELF/PE/Mach-O using file SHA-256 + section/address tuples, attaches `code_id` anchors, and turns imports/load commands into `import` edges.
|
||||
* **Schema wiring (2025-12-12):** `reachability-union` + `richgraph-v1` serializers now emit `symbol {mangled,demangled,source,confidence}` and optional `code_block_hash` for stripped blocks; confidence is clamped to `[0,1]` and `source` normalized to uppercase (`DWARF|PDB|SYM|NONE`).
|
||||
For stripped binaries or name-less code blocks:
|
||||
|
||||
### 3.2 Runtime + Signals (GAP-ZAS-002 / GAP-SIG-003)
|
||||
```
|
||||
code:{lang}:{base64url_sha256(format + file_hash + addr + length + section + code_block_hash)}
|
||||
```
|
||||
|
||||
* Extend Zastava Observer NDJSON schema to emit: `symbol_id`, `code_id`, `hit_count`, `observed_at`, `loader_base`, `process.buildId`.
|
||||
* Implement `/signals/runtime-facts` ingestion (gzip + NDJSON) with CAS-backed storage under `cas://reachability/runtime/{sha256}`.
|
||||
* Update `ReachabilityScoringService` to lattice states and include runtime evidence references plus CAS URIs in `ReachabilityFactDocument.Metadata`.
|
||||
Example for stripped ELF:
|
||||
```
|
||||
code:binary:YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo
|
||||
```
|
||||
|
||||
### 3.3 Replay & Evidence (GAP-REP-004)
|
||||
### 2.3 Graph Node Schema
|
||||
|
||||
* Enforce CAS registration + BLAKE3 hashing before manifest writes (graphs and traces).
|
||||
* Teach `ReachabilityReplayWriter` to require analyzer name/version, graph kind, `code_id` coverage summary.
|
||||
* Update `docs/replay/DETERMINISTIC_REPLAY.md` once schema v2 is finalized.
|
||||
|
||||
### 3.4 Policy, VEX, CLI/UI (GAP-POL-005 / GAP-VEX-006)
|
||||
|
||||
* Policy Engine: ingest new reachability facts, expose `reachability.state`, `max_path_conf`, and `evidence.graph_hash` via SPL + API.
|
||||
* CLI/UI: add `stella graph explain` and explain drawer showing call path (`SymbolID` list), code anchors, runtime hits, DSSE references.
|
||||
* Notify templates: include short evidence summary (first hop + truncated `code_id`).
|
||||
|
||||
### 3.5 Documentation & Samples (GAP-DOC-008)
|
||||
|
||||
* Publish schema diffs in `docs/data/evidence-schema.md` (new file) covering SBOM evidence nodes, runtime NDJSON, and API responses.
|
||||
* Write CLI/API walkthroughs in `docs/09_API_CLI_REFERENCE.md` and `docs/api/policy.md` showing how to request reachability evidence and verify DSSE chains.
|
||||
* Produce OpenVEX + replay samples under `samples/reachability/` showing `facts.type = "stella.reachability"` with `graph_hash` and `code_id` arrays.
|
||||
|
||||
### 3.6 Native lifter & Reachability Store (SCANNER-NATIVE-401-015 / SIG-STORE-401-016)
|
||||
|
||||
* Stand up `Scanner.Symbols.Native` + `Scanner.CallGraph.Native` libraries that:
|
||||
* parse ELF (DWARF + `.symtab`/`.dynsym`), PE/COFF (CodeView/PDB), and stripped binaries via probabilistic carving;
|
||||
* emit deterministic `FuncNode` + `CallEdge` records with demangled names, language hints, and `{confidence,evidence}` arrays; and
|
||||
* attach analyzer + toolchain identifiers consumed by `richgraph-v1`.
|
||||
* Introduce `Reachability.Store` collections in Mongo:
|
||||
* `func_nodes` – keyed by `func:<format>:<sha256>:<va>` with `{binDigest,name,addr,size,lang,confidence,sym}`.
|
||||
* `call_edges` – `{from,to,kind,confidence,evidence[]}` linking internal/external nodes.
|
||||
* `cve_func_hits` – `{cve,purl,func_id,match_kind,confidence,source}` for advisory alignment.
|
||||
* Build indexes (`binDigest+name`, `from→to`, `cve+func_id`) and expose repository interfaces so Scanner, Signals, and Policy can reuse the same canonical data without duplicating queries.
|
||||
|
||||
---
|
||||
|
||||
## 4. Schema & API Touchpoints
|
||||
|
||||
Authoritative field list lives in `docs/reachability/evidence-schema.md`; use it for DTOs and CAS writers.
|
||||
|
||||
The next implementation pass must cover the following documents/files (create them if missing):
|
||||
|
||||
1. `docs/data/evidence-schema.md` – authoritative schema for `{code_id, symbol, tool}` blocks.
|
||||
2. `docs/runbooks/reachability-runtime.md` – operator steps for staging runtime ingestion bundles, retention, and troubleshooting.
|
||||
3. `docs/runbooks/replay_ops.md` – add section detailing replay verification using the new graph/runtime CAS entries.
|
||||
|
||||
API contracts to amend:
|
||||
|
||||
- `POST /signals/callgraphs` response includes `graphHash` (sha256) for the normalized callgraph; richgraph-v1 uses BLAKE3 for graph CAS hashes.
|
||||
- `POST /signals/runtime-facts` request body schema (NDJSON) with `symbol_id`, `code_id`, `hit_count`, `loader_base`.
|
||||
- `GET /policy/findings` payload must surface `reachability.evidence[]` objects.
|
||||
|
||||
### 4.1 Signals runtime ingestion snapshot (Nov 2025)
|
||||
|
||||
- `/signals/runtime-facts` (JSON) and `/signals/runtime-facts/ndjson` (streaming, optional gzip) accept the following event fields:
|
||||
- `symbolId` (required), `codeId`, `loaderBase`, `hitCount`, `processId`, `processName`, `socketAddress`, `containerId`, `evidenceUri`, `metadata`.
|
||||
- Subject context (`scanId` / `imageDigest` / `component` / `version`) plus `callgraphId` is supplied either in the JSON body or as query params for the NDJSON endpoint.
|
||||
- Signals dedupes events, merges metadata, and persists the aggregated `RuntimeFacts` onto `ReachabilityFactDocument`. These facts now feed reachability scoring (SIGNALS-24-004/005) as part of the runtime bonus lattice.
|
||||
- Outstanding work: record CAS URIs for runtime traces, emit provenance events, and expose the enriched context to Policy/Replay consumers.
|
||||
|
||||
### 4.2 Reachability store layout (SIG-STORE-401-016)
|
||||
|
||||
All producers **must** persist native function evidence using the shared collections below (names are advisory; exact names live in Mongo options):
|
||||
Each node in a richgraph-v1 document includes:
|
||||
|
||||
```json
|
||||
// func_nodes
|
||||
{
|
||||
"_id": "func:ELF:sha256:4012a0",
|
||||
"binDigest": "sha256:deadbeef...",
|
||||
"name": "ssl3_read_bytes",
|
||||
"addr": "0x4012a0",
|
||||
"size": 312,
|
||||
"lang": "c",
|
||||
"confidence": 0.92,
|
||||
"symbol": { "mangled": "_Z15ssl3_read_bytes", "demangled": "ssl3_read_bytes", "source": "DWARF" },
|
||||
"sym": "present"
|
||||
}
|
||||
|
||||
// call_edges
|
||||
{
|
||||
"from": "func:ELF:sha256:4012a0",
|
||||
"to": "func:ELF:sha256:40f0ff",
|
||||
"kind": "static",
|
||||
"confidence": 0.88,
|
||||
"evidence": ["reloc:.plt.got", "bb-target:0x40f0ff"]
|
||||
}
|
||||
|
||||
// cve_func_hits
|
||||
{
|
||||
"cve": "CVE-2023-XXXX",
|
||||
"purl": "pkg:generic/openssl@1.1.1u",
|
||||
"func_id": "func:ELF:sha256:4012a0",
|
||||
"match": "name+version",
|
||||
"confidence": 0.77,
|
||||
"source": "concelier:openssl-advisory"
|
||||
"id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"symbol_id": "sym:java:R3JlZXRpbmdTZXJ2aWNl...",
|
||||
"code_id": "code:java:...",
|
||||
"lang": "java",
|
||||
"kind": "method",
|
||||
"display": "com.example.GreetingService.greet(String)",
|
||||
"purl": "pkg:maven/com.example/greeting-service@1.0.0",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"symbol_digest": "sha256:e5f6a7b8...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"symbol": {
|
||||
"mangled": null,
|
||||
"demangled": "com.example.GreetingService.greet(String)",
|
||||
"source": "DWARF",
|
||||
"confidence": 0.98
|
||||
},
|
||||
"evidence": ["import", "bytecode"],
|
||||
"attributes": {}
|
||||
}
|
||||
```
|
||||
|
||||
Writers **must**:
|
||||
### 2.4 Graph Edge Schema
|
||||
|
||||
1. Upsert `func_nodes` before emitting edges/hits to ensure `_id` lookups remain stable.
|
||||
2. Serialize evidence arrays in deterministic order (`reloc`, `bb-target`, `import`, …) and normalise hex casing.
|
||||
3. Attach analyzer fingerprints (`scanner.native@sha256:...`) so Replay/Policy can enforce provenance.
|
||||
Edges carry callee `purl` and `symbol_digest` for SBOM correlation:
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym:java:caller...",
|
||||
"to": "sym:java:callee...",
|
||||
"kind": "call",
|
||||
"purl": "pkg:maven/org.apache.logging.log4j/log4j-core@2.14.1",
|
||||
"symbol_digest": "sha256:f1e2d3c4...",
|
||||
"confidence": 0.92,
|
||||
"evidence": ["bytecode", "import"],
|
||||
"candidates": []
|
||||
}
|
||||
```
|
||||
|
||||
### 2.5 Evidence Block Schema
|
||||
|
||||
Evidence blocks in Policy/VEX responses cite all relevant identifiers:
|
||||
|
||||
```json
|
||||
{
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"graph_cas_uri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsse_uri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "processRequest()"},
|
||||
{"symbol_id": "sym:java:...", "display": "log4j.error()"}
|
||||
],
|
||||
"path_length": 3,
|
||||
"confidence": 0.85,
|
||||
"runtime_hits": ["probe:jfr:1234"],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Test & Fixture Expectations
|
||||
## 3. API Usage
|
||||
|
||||
- **Reachbench fixtures**: update golden cases with `code_id` + `symbol` metadata. Ensure both reachable/unreachable variants still pass once graphs contain the richer IDs.
|
||||
- **Signals unit tests**: add deterministic tests for lattice scoring + runtime evidence linking (`tests/reachability/StellaOps.Signals.Reachability.Tests`).
|
||||
- **Replay tests**: extend `tests/reachability/StellaOps.Replay.Core.Tests` to assert manifest v2 serialization and hash enforcement.
|
||||
### 3.1 Signals Callgraph Ingestion
|
||||
|
||||
All fixtures must remain deterministic: sort nodes/edges, normalise casing, and freeze timestamps in test data.
|
||||
Submit a callgraph and receive a deterministic `graph_hash`:
|
||||
|
||||
```http
|
||||
POST /signals/callgraphs
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"schema": "richgraph-v1",
|
||||
"analyzer": {"name": "scanner.java", "version": "1.2.0"},
|
||||
"nodes": [...],
|
||||
"edges": [...],
|
||||
"roots": [...]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"graphHash": "blake3:a1b2c3d4e5f6...",
|
||||
"casUri": "cas://reachability/graphs/a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/a1b2c3d4e5f6....dsse",
|
||||
"nodeCount": 1247,
|
||||
"edgeCount": 3891
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Signals Runtime Facts
|
||||
|
||||
Submit runtime observations with `code_id` anchors:
|
||||
|
||||
```http
|
||||
POST /signals/runtime-facts/ndjson?scanId=scan-123&imageDigest=sha256:abc123
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/x-ndjson
|
||||
Content-Encoding: gzip
|
||||
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":47,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:00Z"}
|
||||
{"symbolId":"sym:java:...","codeId":"code:java:...","hitCount":12,"loaderBase":"0x7f...","processId":1234,"observedAt":"2025-12-13T10:00:01Z"}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"accepted": 128,
|
||||
"duplicates": 2,
|
||||
"evidenceUri": "cas://reachability/runtime/sha256:xyz789..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Fetch Reachability Facts
|
||||
|
||||
Query reachability state for a subject:
|
||||
|
||||
```http
|
||||
GET /signals/facts/{subjectKey}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"subjectKey": "scan:123:pkg:maven/log4j:2.14.1:CVE-2021-44228",
|
||||
"metadata": {
|
||||
"fact": {
|
||||
"digest": "sha256:abc123...",
|
||||
"version": 3
|
||||
}
|
||||
},
|
||||
"states": [
|
||||
{
|
||||
"symbol": "sym:java:...",
|
||||
"latticeState": "CR",
|
||||
"bucket": "runtime",
|
||||
"confidence": 0.92,
|
||||
"score": 0.78,
|
||||
"path": ["sym:java:main...", "sym:java:process...", "sym:java:log4j..."],
|
||||
"evidence": {
|
||||
"static": {"graphHash": "blake3:...", "pathLength": 3, "confidence": 0.85},
|
||||
"runtime": {"probeId": "probe:jfr:1234", "hitCount": 47, "observedAt": "2025-12-13T10:00:00Z"}
|
||||
}
|
||||
}
|
||||
],
|
||||
"score": 0.78,
|
||||
"aggregateTier": "T2",
|
||||
"riskScore": 0.65
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Policy Findings with Reachability Evidence
|
||||
|
||||
```http
|
||||
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
**Response (excerpt):**
|
||||
|
||||
```json
|
||||
{
|
||||
"findingId": "P-7:S-42:pkg:maven/log4j@2.14.1:CVE-2021-44228",
|
||||
"reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"evidence": {
|
||||
"graph_hash": "blake3:a1b2c3d4...",
|
||||
"path": [
|
||||
{"symbol_id": "sym:java:...", "display": "main()"},
|
||||
{"symbol_id": "sym:java:...", "display": "Logger.error()"}
|
||||
],
|
||||
"runtime_hits": 47,
|
||||
"fact_digest": "sha256:abc123..."
|
||||
}
|
||||
},
|
||||
"steps": [
|
||||
{"rule": "reachability_gate", "state": "CR", "allowed": true},
|
||||
{"rule": "severity_baseline", "severity": {"normalized": "Critical", "score": 10.0}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Handoff Checklist for the Next Agent
|
||||
## 4. CLI Usage
|
||||
|
||||
1. Confirm sprint entries (`SPRINT_400` and `SPRINT_401`) remain in sync when moving `GAP-*` tasks to DOING/DONE.
|
||||
2. Start with `GAP-SYM-007` (schema/helper implementation) because downstream work depends on the new `code_id` payload shape.
|
||||
3. Once schema PR merges, coordinate with Signals + Policy guilds to align on CAS naming and DSSE predicates before wiring APIs.
|
||||
4. Update the docs listed in §4 as each component lands; keep this file current with statuses and links to PRs/ADRs.
|
||||
5. Before shipping, run the reachbench fixtures end-to-end and capture hashes for inclusion in replay docs.
|
||||
### 4.1 Graph Explain Command
|
||||
|
||||
Keep this document updated as tasks change state; it is the authoritative hand-off note for the advisory.
|
||||
View the call path and evidence for a finding:
|
||||
|
||||
```bash
|
||||
stella graph explain --finding "pkg:maven/log4j@2.14.1:CVE-2021-44228" --scan-id scan-123
|
||||
|
||||
# Output:
|
||||
Finding: CVE-2021-44228 in pkg:maven/log4j@2.14.1
|
||||
Reachability: CONFIRMED_REACHABLE (CR)
|
||||
Confidence: 0.92
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
|
||||
Call Path (3 hops):
|
||||
1. main() [sym:java:R3JlZXRpbmcuLi4=]
|
||||
-> processRequest() [direct call]
|
||||
2. processRequest() [sym:java:cHJvY2Vzcy4uLg==]
|
||||
-> Logger.error() [virtual call]
|
||||
3. Logger.error() [sym:java:bG9nNGouLi4=]
|
||||
[VULNERABLE: CVE-2021-44228]
|
||||
|
||||
Runtime Evidence:
|
||||
- JFR probe hit: 47 times
|
||||
- Last observed: 2025-12-13T10:00:00Z
|
||||
|
||||
DSSE Attestation: cas://reachability/graphs/a1b2c3d4....dsse
|
||||
```
|
||||
|
||||
### 4.2 Graph Export Command
|
||||
|
||||
Export a reachability graph for offline analysis:
|
||||
|
||||
```bash
|
||||
stella graph export --scan-id scan-123 --output ./evidence-bundle/
|
||||
|
||||
# Creates:
|
||||
# ./evidence-bundle/richgraph-v1.json # Canonical graph
|
||||
# ./evidence-bundle/richgraph-v1.json.dsse # DSSE envelope
|
||||
# ./evidence-bundle/meta.json # Metadata
|
||||
# ./evidence-bundle/runtime-facts.ndjson # Runtime observations
|
||||
```
|
||||
|
||||
### 4.3 Graph Verify Command
|
||||
|
||||
Verify a graph's DSSE signature and Rekor inclusion:
|
||||
|
||||
```bash
|
||||
stella graph verify --graph ./evidence-bundle/richgraph-v1.json \
|
||||
--dsse ./evidence-bundle/richgraph-v1.json.dsse \
|
||||
--rekor-log
|
||||
|
||||
# Output:
|
||||
Graph Hash: blake3:a1b2c3d4e5f6...
|
||||
DSSE Signature: VALID (key: scanner-signing-2025)
|
||||
Rekor Entry: 12345678 (verified)
|
||||
Timestamp: 2025-12-13T09:30:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. OpenVEX Integration
|
||||
|
||||
### 5.1 OpenVEX with Reachability Evidence
|
||||
|
||||
When Policy emits VEX decisions, reachability evidence is included:
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://openvex.dev/ns/v0.2.0",
|
||||
"@id": "https://stellaops.example/vex/2025-12-13/001",
|
||||
"author": "StellaOps Policy Engine",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"version": 1,
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2021-44228"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "affected",
|
||||
"justification": "vulnerable_code_in_container",
|
||||
"impact_statement": "Vulnerable Log4j method reachable from main entry point.",
|
||||
"action_statement": "Upgrade to log4j 2.17.1 or later.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CR",
|
||||
"confidence": 0.92,
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"path_length": 3,
|
||||
"evidence_uri": "cas://reachability/graphs/a1b2c3d4..."
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 VEX "not_affected" with Unreachability Evidence
|
||||
|
||||
When code is provably unreachable:
|
||||
|
||||
```json
|
||||
{
|
||||
"statements": [
|
||||
{
|
||||
"vulnerability": {"@id": "CVE-2023-XXXXX"},
|
||||
"products": [{"@id": "pkg:oci/myapp@sha256:abc123..."}],
|
||||
"status": "not_affected",
|
||||
"justification": "vulnerable_code_not_in_execute_path",
|
||||
"impact_statement": "Vulnerable function not reachable from any entry point.",
|
||||
"stellaops:reachability": {
|
||||
"state": "CU",
|
||||
"confidence": 0.88,
|
||||
"graph_hash": "blake3:d4e5f6a7b8c9...",
|
||||
"evidence_uri": "cas://reachability/graphs/d4e5f6a7b8c9...",
|
||||
"runtime_observation_window": "72h",
|
||||
"runtime_hits": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Replay Manifest v2
|
||||
|
||||
### 6.1 Manifest Structure
|
||||
|
||||
Replay manifests now enforce BLAKE3 hashing and CAS registration:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.replay.manifest@v2",
|
||||
"subject": "scan:123",
|
||||
"generatedAt": "2025-12-13T10:00:00Z",
|
||||
"hashAlg": "blake3",
|
||||
"artifacts": [
|
||||
{
|
||||
"kind": "richgraph",
|
||||
"uri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6...",
|
||||
"hash": "blake3:a1b2c3d4e5f6...",
|
||||
"dsseUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6....dsse"
|
||||
},
|
||||
{
|
||||
"kind": "runtime-facts",
|
||||
"uri": "cas://reachability/runtime/sha256:xyz789...",
|
||||
"hash": "sha256:xyz789..."
|
||||
},
|
||||
{
|
||||
"kind": "sbom",
|
||||
"uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"hash": "sha256:def456..."
|
||||
}
|
||||
],
|
||||
"analyzer": {
|
||||
"name": "scanner.java",
|
||||
"version": "1.2.0",
|
||||
"toolchain_digest": "sha256:..."
|
||||
},
|
||||
"code_id_coverage": {
|
||||
"total_symbols": 1247,
|
||||
"with_code_id": 1189,
|
||||
"coverage_pct": 95.3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Determinism Verification
|
||||
|
||||
Replay a manifest to verify determinism:
|
||||
|
||||
```bash
|
||||
stella replay verify --manifest ./manifest.json --sealed
|
||||
|
||||
# Output:
|
||||
Manifest: stellaops.replay.manifest@v2
|
||||
Subject: scan:123
|
||||
Artifacts: 3
|
||||
|
||||
Verifying richgraph...
|
||||
Computed: blake3:a1b2c3d4e5f6...
|
||||
Expected: blake3:a1b2c3d4e5f6...
|
||||
Status: MATCH
|
||||
|
||||
Verifying runtime-facts...
|
||||
Computed: sha256:xyz789...
|
||||
Expected: sha256:xyz789...
|
||||
Status: MATCH
|
||||
|
||||
Verifying sbom...
|
||||
Computed: sha256:def456...
|
||||
Expected: sha256:def456...
|
||||
Status: MATCH
|
||||
|
||||
All artifacts verified. Determinism check PASSED.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Module Integration Guide
|
||||
|
||||
### 7.1 Scanner -> Signals
|
||||
|
||||
Scanner emits richgraph-v1 with `code_id` and `symbol_id`:
|
||||
|
||||
1. Scanner analyzes container/artifact
|
||||
2. Callgraph generators emit nodes with `symbol_id`, `code_id`, `build_id`
|
||||
3. RichGraphWriter canonicalizes (sorted arrays/keys) and computes `graph_hash` (BLAKE3)
|
||||
4. DSSE signer wraps canonical JSON
|
||||
5. CAS store persists body + envelope
|
||||
6. Signals ingestion API receives URI reference
|
||||
|
||||
### 7.2 Signals -> Policy
|
||||
|
||||
Signals provides reachability facts to Policy:
|
||||
|
||||
1. Policy queries `/signals/facts/{subjectKey}`
|
||||
2. Response includes `metadata.fact.digest`, `states[]`, `score`
|
||||
3. Policy gates check `latticeState` (U, SR, SU, RO, RU, CR, CU, X)
|
||||
4. Evidence blocks in findings reference `graph_hash`, `path[]`, `runtime_hits[]`
|
||||
|
||||
### 7.3 Policy -> VEX/UI
|
||||
|
||||
Policy emits OpenVEX with evidence:
|
||||
|
||||
1. VexDecisionEmitter serializes OpenVEX with `stellaops:reachability` extension
|
||||
2. UI explain drawer fetches evidence via `/api/policy/findings/{id}/explain`
|
||||
3. CLI `stella graph explain` renders call path and attestation refs
|
||||
|
||||
---
|
||||
|
||||
## 8. CAS Layout Reference
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # Graph body (canonical JSON)
|
||||
{blake3}.dsse # DSSE envelope
|
||||
edges/
|
||||
{graph_hash}/{bundle_id} # Edge bundle body (optional)
|
||||
{graph_hash}/{bundle_id}.dsse
|
||||
runtime/
|
||||
{sha256}/ # Runtime facts NDJSON
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Related Documentation
|
||||
|
||||
- [Reachability Lattice Model](./lattice.md) - State definitions and join rules
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Schema specification
|
||||
- [Evidence Schema](./evidence-schema.md) - Detailed field definitions
|
||||
- [Signals API Contract](../api/signals/reachability-contract.md) - API reference
|
||||
- [Policy Gates](./policy-gate.md) - Gate configuration
|
||||
- [Hybrid Attestation](./hybrid-attestation.md) - Graph and edge-bundle DSSE
|
||||
- [Ground Truth Schema](./ground-truth-schema.md) - Test fixture format
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GAP-DOC-008 for change history._
|
||||
|
||||
377
docs/reachability/graph-revision-schema.md
Normal file
377
docs/reachability/graph-revision-schema.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Graph Revision Schema
|
||||
|
||||
_Last updated: 2025-12-13. Owner: Platform Guild._
|
||||
|
||||
This document defines the graph revision schema addressing gaps GR1-GR10 from the November 2025 product findings. It specifies manifest structure, hash algorithms, storage layout, lineage tracking, and governance rules for deterministic, auditable reachability graphs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Graph revisions provide content-addressable, append-only versioning for `richgraph-v1` documents. Every graph mutation produces a new immutable revision with:
|
||||
|
||||
- **Deterministic hash:** BLAKE3-256 of canonical JSON
|
||||
- **Lineage metadata:** Parent revision + diff summary
|
||||
- **Cross-artifact digests:** Links to SBOM, VEX, policy, and tool versions
|
||||
- **Audit trail:** Timestamp, actor, tenant, and operation type
|
||||
|
||||
---
|
||||
|
||||
## 2. Gap Resolutions
|
||||
|
||||
### GR1: Manifest Schema + Canonical Hash Rules
|
||||
|
||||
**Manifest schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:a1b2c3d4e5f6...",
|
||||
"graph_hash": "blake3:a1b2c3d4e5f6...",
|
||||
"parent_revision_id": "rev:blake3:9f8e7d6c5b4a...",
|
||||
"created_at": "2025-12-13T10:00:00Z",
|
||||
"created_by": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"operation": "create",
|
||||
"lineage": {
|
||||
"depth": 3,
|
||||
"root_revision_id": "rev:blake3:1a2b3c4d5e6f..."
|
||||
},
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"vex_digest": "sha256:...",
|
||||
"policy_digest": "sha256:...",
|
||||
"analyzer_digest": "sha256:..."
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"roots_changed": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Canonical hash rules:**
|
||||
|
||||
1. JSON keys sorted alphabetically at all nesting levels
|
||||
2. No whitespace/indentation (compact JSON)
|
||||
3. UTF-8 encoding, no BOM
|
||||
4. Arrays sorted by deterministic key (nodes by `id`, edges by `from,to,kind`)
|
||||
5. Null/empty values omitted
|
||||
6. Numeric values without trailing zeros
|
||||
|
||||
### GR2: Mandated BLAKE3-256 Encoding
|
||||
|
||||
All graph-level hashes use BLAKE3-256 with the following format:
|
||||
|
||||
```
|
||||
blake3:{64_hex_chars}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- BLAKE3 is 3x+ faster than SHA-256 on modern CPUs
|
||||
- Parallelizable for large graphs (>100K nodes)
|
||||
- Cryptographically secure (256-bit security)
|
||||
- Algorithm prefix enables future migration
|
||||
|
||||
### GR3: Append-Only Storage
|
||||
|
||||
Graph revisions are immutable. Operations:
|
||||
|
||||
| Operation | Creates New Revision | Modifies Existing |
|
||||
|-----------|---------------------|-------------------|
|
||||
| `create` | Yes | No |
|
||||
| `update` | Yes | No |
|
||||
| `merge` | Yes | No |
|
||||
| `tombstone` | Yes | No |
|
||||
| `read` | No | No |
|
||||
|
||||
**Storage layout:**
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
revisions/
|
||||
{blake3}/ # Revision manifest
|
||||
{blake3}.graph # Graph body
|
||||
{blake3}.dsse # DSSE envelope
|
||||
indices/
|
||||
by-tenant/{tenant_id}/ # Tenant index
|
||||
by-sbom/{sbom_digest}/ # SBOM correlation
|
||||
by-root/{root_revision_id}/ # Lineage tree
|
||||
```
|
||||
|
||||
### GR4: Lineage/Diff Metadata
|
||||
|
||||
Every revision tracks its lineage:
|
||||
|
||||
```json
|
||||
{
|
||||
"lineage": {
|
||||
"depth": 5,
|
||||
"root_revision_id": "rev:blake3:...",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"merge_parents": []
|
||||
},
|
||||
"diff_summary": {
|
||||
"nodes_added": 12,
|
||||
"nodes_removed": 3,
|
||||
"nodes_modified": 0,
|
||||
"edges_added": 24,
|
||||
"edges_removed": 8,
|
||||
"edges_modified": 0,
|
||||
"roots_added": 0,
|
||||
"roots_removed": 0
|
||||
},
|
||||
"diff_detail_uri": "cas://reachability/diffs/{parent_hash}_{child_hash}.ndjson"
|
||||
}
|
||||
```
|
||||
|
||||
**Diff detail format (NDJSON):**
|
||||
|
||||
```ndjson
|
||||
{"op":"add","path":"nodes","value":{"id":"sym:java:...","display":"..."}}
|
||||
{"op":"remove","path":"edges","from":"sym:java:a","to":"sym:java:b"}
|
||||
```
|
||||
|
||||
### GR5: Cross-Artifact Digests (SBOM/VEX/Policy/Tool)
|
||||
|
||||
Every revision links to related artifacts:
|
||||
|
||||
```json
|
||||
{
|
||||
"cross_artifacts": {
|
||||
"sbom_digest": "sha256:...",
|
||||
"sbom_uri": "cas://scanner-artifacts/sbom.cdx.json",
|
||||
"sbom_format": "cyclonedx-1.6",
|
||||
"vex_digest": "sha256:...",
|
||||
"vex_uri": "cas://excititor/vex/openvex.json",
|
||||
"policy_digest": "sha256:...",
|
||||
"policy_version": "P-7:v4",
|
||||
"analyzer_digest": "sha256:...",
|
||||
"analyzer_name": "scanner.java",
|
||||
"analyzer_version": "1.2.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR6: UI/CLI Surfacing of Full/Short IDs
|
||||
|
||||
**Full ID format:**
|
||||
```
|
||||
rev:blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2
|
||||
```
|
||||
|
||||
**Short ID format (for display):**
|
||||
```
|
||||
rev:a1b2c3d4
|
||||
```
|
||||
|
||||
**CLI commands:**
|
||||
|
||||
```bash
|
||||
# List revisions
|
||||
stella graph revisions --scan-id scan-123
|
||||
|
||||
# Show full ID
|
||||
stella graph revisions --scan-id scan-123 --full
|
||||
|
||||
# Output:
|
||||
REVISION CREATED NODES EDGES PARENT
|
||||
rev:a1b2c3d4 2025-12-13T10:00:00 1247 3891 rev:9f8e7d6c
|
||||
rev:9f8e7d6c 2025-12-12T15:30:00 1235 3867 rev:1a2b3c4d
|
||||
```
|
||||
|
||||
**UI display:**
|
||||
|
||||
- Revision chips show short ID with copy-to-clipboard for full ID
|
||||
- Hover tooltip shows full ID and creation timestamp
|
||||
- Lineage tree visualization available in "Revision History" drawer
|
||||
|
||||
### GR7: Shard/Tenant Context
|
||||
|
||||
Every revision includes partition context:
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant_id": "tenant:acme",
|
||||
"shard_id": "shard:01",
|
||||
"namespace": "prod",
|
||||
"workspace_id": "ws:default"
|
||||
}
|
||||
```
|
||||
|
||||
**Tenant isolation:**
|
||||
|
||||
- Revisions are tenant-scoped; cross-tenant access requires explicit grants
|
||||
- Shard ID enables horizontal scaling and data locality
|
||||
- Namespace supports multi-environment deployments
|
||||
|
||||
### GR8: Pin/Audit Governance
|
||||
|
||||
**Pinned revisions:**
|
||||
|
||||
Revisions can be pinned to prevent automatic retention cleanup:
|
||||
|
||||
```json
|
||||
{
|
||||
"pinned": true,
|
||||
"pinned_at": "2025-12-13T10:00:00Z",
|
||||
"pinned_by": "user:alice",
|
||||
"pin_reason": "Audit retention for CVE-2021-44228 investigation",
|
||||
"pin_expires_at": "2026-12-13T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Audit events:**
|
||||
|
||||
All revision operations emit audit events:
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "graph.revision.created",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"actor": "service:scanner",
|
||||
"tenant_id": "tenant:acme",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"metadata": {
|
||||
"operation": "create",
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GR9: Retention/Tombstones
|
||||
|
||||
**Retention policy:**
|
||||
|
||||
| Category | Default Retention | Configurable |
|
||||
|----------|-------------------|--------------|
|
||||
| Latest revision | Forever | No |
|
||||
| Intermediate revisions | 90 days | Yes |
|
||||
| Tombstoned revisions | 30 days | Yes |
|
||||
| Pinned revisions | Until unpin + 7 days | No |
|
||||
|
||||
**Tombstone format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.graph.revision@v1",
|
||||
"revision_id": "rev:blake3:...",
|
||||
"tombstone": true,
|
||||
"tombstoned_at": "2025-12-13T10:00:00Z",
|
||||
"tombstoned_by": "service:retention-worker",
|
||||
"tombstone_reason": "retention_policy",
|
||||
"successor_revision_id": "rev:blake3:..."
|
||||
}
|
||||
```
|
||||
|
||||
### GR10: Inclusion in Offline Kits
|
||||
|
||||
Offline kits include graph revisions for air-gapped deployments:
|
||||
|
||||
**Offline bundle manifest:**
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": "stellaops.offline.bundle@v1",
|
||||
"bundle_id": "bundle:2025-12-13",
|
||||
"graph_revisions": [
|
||||
{
|
||||
"revision_id": "rev:blake3:...",
|
||||
"graph_hash": "blake3:...",
|
||||
"included_artifacts": ["graph", "dsse", "diff"]
|
||||
}
|
||||
],
|
||||
"rekor_checkpoints": [
|
||||
{
|
||||
"log_id": "rekor.sigstore.dev",
|
||||
"checkpoint": "...",
|
||||
"verified_at": "2025-12-13T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"signature": {
|
||||
"algorithm": "ecdsa-p256",
|
||||
"value": "base64:...",
|
||||
"public_key_id": "key:offline-signing-2025"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Import verification:**
|
||||
|
||||
```bash
|
||||
stella offline import --bundle ./offline-bundle.tgz --verify
|
||||
|
||||
# Output:
|
||||
Bundle: bundle:2025-12-13
|
||||
Graph Revisions: 5
|
||||
Rekor Checkpoints: 2
|
||||
|
||||
Verifying signatures...
|
||||
Bundle signature: VALID
|
||||
DSSE envelopes: 5/5 VALID
|
||||
Rekor checkpoints: 2/2 VERIFIED
|
||||
|
||||
Import complete.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. API Reference
|
||||
|
||||
### 3.1 Create Revision
|
||||
|
||||
```http
|
||||
POST /api/graph/revisions
|
||||
Authorization: Bearer <token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"graph": { ... richgraph-v1 ... },
|
||||
"parent_revision_id": "rev:blake3:...",
|
||||
"cross_artifacts": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Get Revision
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/{revision_id}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.3 List Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions?tenant_id=acme&sbom_digest=sha256:...&limit=20
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
### 3.4 Diff Revisions
|
||||
|
||||
```http
|
||||
GET /api/graph/revisions/diff?from={rev_a}&to={rev_b}
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Related Documentation
|
||||
|
||||
- [richgraph-v1 Contract](../contracts/richgraph-v1.md) - Graph schema specification
|
||||
- [Function-Level Evidence](./function-level-evidence.md) - Evidence chain guide
|
||||
- [CAS Infrastructure](../contracts/cas-infrastructure.md) - Content-addressable storage
|
||||
- [Offline Kit](../24_OFFLINE_KIT.md) - Air-gap deployment
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GRAPHREV-GAPS-401-063 for change history._
|
||||
@@ -84,7 +84,93 @@ Stella Ops provides **true hybrid reachability** by combining:
|
||||
|
||||
**Evidence linking:** Each edge in the graph or bundle includes `evidenceRefs` pointing to the underlying proof artifacts (static analysis artifacts, runtime traces), enabling **evidence-linked VEX decisions**.
|
||||
|
||||
## 8. Open decisions (tracked in Sprint 0401 tasks 53–56)
|
||||
- Rekor publish defaults per deployment tier (regulated vs standard).
|
||||
- CLI UX for selective bundle verification.
|
||||
- Bench coverage for edge-bundle verification time/size.
|
||||
## 8. Decisions (Frozen 2025-12-13)
|
||||
|
||||
### 8.1 DSSE/Rekor Budget by Deployment Tier
|
||||
|
||||
| Tier | Graph DSSE | Edge-Bundle DSSE | Rekor Publish | Max Bundles/Graph |
|
||||
|------|------------|------------------|---------------|-------------------|
|
||||
| **Regulated** (SOC2, FedRAMP, PCI) | Required | Required for runtime/contested | Required | 10 |
|
||||
| **Standard** | Required | Optional (criteria-based) | Graph only | 5 |
|
||||
| **Air-gapped** | Required | Optional | Offline checkpoint | 5 |
|
||||
| **Dev/Test** | Optional | Optional | Disabled | Unlimited |
|
||||
|
||||
**Budget enforcement:**
|
||||
- Graph DSSE: Always submit digest to Rekor (or offline checkpoint for air-gapped)
|
||||
- Edge-bundle DSSE: Submit to Rekor only when `bundle_reason` is `disputed`, `runtime-hit`, or `security-critical`
|
||||
- Cap enforced by `reachability.edgeBundles.maxRekorPublishes` config (per tier defaults above)
|
||||
|
||||
### 8.2 Signing Layout and CAS Paths
|
||||
|
||||
```
|
||||
cas://reachability/
|
||||
graphs/
|
||||
{blake3}/ # richgraph-v1 body (JSON)
|
||||
{blake3}.dsse # Graph DSSE envelope
|
||||
{blake3}.rekor # Rekor inclusion proof (optional)
|
||||
edges/
|
||||
{graph_hash}/
|
||||
{bundle_id}.json # Edge bundle body
|
||||
{bundle_id}.dsse # Edge bundle DSSE envelope
|
||||
{bundle_id}.rekor # Rekor inclusion proof (if published)
|
||||
revisions/
|
||||
{revision_id}/ # Revision manifest + lineage
|
||||
```
|
||||
|
||||
**Signing workflow:**
|
||||
1. Canonicalize richgraph-v1 JSON (sorted keys, arrays by deterministic key)
|
||||
2. Compute BLAKE3-256 hash -> `graph_hash`
|
||||
3. Create DSSE envelope with `stella.ops/graph@v1` predicate
|
||||
4. Submit digest to Rekor (online) or cache checkpoint (offline)
|
||||
5. Store graph body + envelope + proof in CAS
|
||||
|
||||
### 8.3 CLI UX for Selective Bundle Verification
|
||||
|
||||
```bash
|
||||
# Verify graph DSSE only (default)
|
||||
stella graph verify --hash blake3:a1b2c3d4...
|
||||
|
||||
# Verify graph + all edge bundles
|
||||
stella graph verify --hash blake3:a1b2c3d4... --include-bundles
|
||||
|
||||
# Verify specific edge bundle
|
||||
stella graph verify --hash blake3:a1b2c3d4... --bundle bundle:001
|
||||
|
||||
# Offline verification with local CAS
|
||||
stella graph verify --hash blake3:a1b2c3d4... --cas-root ./offline-cas/
|
||||
|
||||
# Verify Rekor inclusion
|
||||
stella graph verify --hash blake3:a1b2c3d4... --rekor-proof
|
||||
|
||||
# Output formats
|
||||
stella graph verify --hash blake3:a1b2c3d4... --format json|table|summary
|
||||
```
|
||||
|
||||
### 8.4 Golden Fixture Plan
|
||||
|
||||
**Fixture location:** `tests/Reachability/Hybrid/`
|
||||
|
||||
**Required fixtures:**
|
||||
| Fixture | Description | Expected Verification Time |
|
||||
|---------|-------------|---------------------------|
|
||||
| `graph-only.golden.json` | Minimal richgraph-v1 with DSSE | < 100ms |
|
||||
| `graph-with-runtime.golden.json` | Graph + 1 runtime edge bundle | < 200ms |
|
||||
| `graph-with-contested.golden.json` | Graph + 1 contested/revoked edge bundle | < 200ms |
|
||||
| `large-graph.golden.json` | 10K nodes, 50K edges, 5 bundles | < 2s |
|
||||
| `offline-bundle.golden.tgz` | Complete offline replay pack | < 5s |
|
||||
|
||||
**CI integration:**
|
||||
- `.gitea/workflows/hybrid-attestation.yml` runs verification fixtures
|
||||
- Size gate: Graph body < 10MB, individual bundle < 1MB
|
||||
- Time gate: Full verification < 5s for standard tier
|
||||
|
||||
### 8.5 Implementation Status
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Graph DSSE predicate | Done | `stella.ops/graph@v1` in PredicateTypes.cs |
|
||||
| Edge-bundle DSSE predicate | Planned | `stella.ops/edgeBundle@v1` |
|
||||
| CAS layout | Done | Per section 8.2 |
|
||||
| CLI verify command | Planned | Per section 8.3 |
|
||||
| Golden fixtures | Planned | Per section 8.4 |
|
||||
| Rekor integration | Done | Via Attestor module |
|
||||
|
||||
311
docs/replay/replay-manifest-v2-acceptance.md
Normal file
311
docs/replay/replay-manifest-v2-acceptance.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# Replay Manifest v2 Acceptance Contract
|
||||
|
||||
_Last updated: 2025-12-13. Owner: BE-Base Platform Guild._
|
||||
|
||||
This document defines the acceptance criteria and test vectors for replay manifest v2, enabling Task 19 (GAP-REP-004) to proceed with implementation.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Replay manifest v2 introduces:
|
||||
|
||||
- **BLAKE3 graph hashes:** Primary hash algorithm for reachability graphs
|
||||
- **Sorted CAS entries:** Deterministic ordering of all CAS references
|
||||
- **hashAlg fields:** Explicit algorithm declarations for forward compatibility
|
||||
- **code_id coverage:** Coverage metrics for stripped binary handling
|
||||
|
||||
---
|
||||
|
||||
## 2. Schema Changes (v1 → v2)
|
||||
|
||||
### 2.1 Version Field
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": "2.0",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Hash Algorithm Declaration
|
||||
|
||||
All hash fields now include explicit algorithm:
|
||||
|
||||
```json
|
||||
{
|
||||
"reachability": {
|
||||
"graphs": [
|
||||
{
|
||||
"hash": "blake3:a1b2c3d4e5f6...",
|
||||
"hashAlg": "blake3-256",
|
||||
"casUri": "cas://reachability/graphs/blake3:a1b2c3d4..."
|
||||
}
|
||||
],
|
||||
"runtimeTraces": [
|
||||
{
|
||||
"hash": "sha256:feedface...",
|
||||
"hashAlg": "sha256",
|
||||
"casUri": "cas://reachability/runtime/sha256:feedface..."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Sorted CAS Entries
|
||||
|
||||
All arrays must be sorted by deterministic key:
|
||||
|
||||
| Array | Sort Key |
|
||||
|-------|----------|
|
||||
| `reachability.graphs[]` | `casUri` (lexicographic) |
|
||||
| `reachability.runtimeTraces[]` | `casUri` (lexicographic) |
|
||||
| `inputs.feeds[]` | `name` (lexicographic) |
|
||||
| `inputs.tools[]` | `name` (lexicographic) |
|
||||
|
||||
### 2.4 Code ID Coverage
|
||||
|
||||
New field for stripped binary support:
|
||||
|
||||
```json
|
||||
{
|
||||
"reachability": {
|
||||
"code_id_coverage": {
|
||||
"total_nodes": 1247,
|
||||
"nodes_with_symbol_id": 1189,
|
||||
"nodes_with_code_id": 58,
|
||||
"coverage_percent": 100.0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. CAS Registration Gates
|
||||
|
||||
### 3.1 Required Registration
|
||||
|
||||
All referenced artifacts must be registered in CAS before manifest finalization:
|
||||
|
||||
| Artifact Type | CAS Path Pattern | Required |
|
||||
|---------------|------------------|----------|
|
||||
| Graph body | `cas://reachability/graphs/{hash}` | Yes |
|
||||
| Graph DSSE | `cas://reachability/graphs/{hash}.dsse` | Yes |
|
||||
| Runtime trace | `cas://reachability/runtime/{hash}` | Conditional |
|
||||
| Edge bundle | `cas://reachability/edges/{graph_hash}/{bundle_id}` | Conditional |
|
||||
|
||||
### 3.2 Registration Validation
|
||||
|
||||
Before signing a replay manifest:
|
||||
|
||||
1. Verify all `casUri` references resolve to existing CAS objects
|
||||
2. Verify hash matches CAS content
|
||||
3. Verify DSSE envelope exists for all graph references
|
||||
4. Fail manifest creation if any reference is missing
|
||||
|
||||
### 3.3 Validation API
|
||||
|
||||
```csharp
|
||||
public interface ICasValidator
|
||||
{
|
||||
Task<CasValidationResult> ValidateAsync(string casUri, string expectedHash);
|
||||
Task<CasValidationResult> ValidateBatchAsync(IEnumerable<CasReference> refs);
|
||||
}
|
||||
|
||||
public record CasValidationResult(
|
||||
bool IsValid,
|
||||
string? ActualHash,
|
||||
string? Error
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Acceptance Test Vectors
|
||||
|
||||
### 4.1 Minimal Valid Manifest v2
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": "2.0",
|
||||
"scan": {
|
||||
"id": "scan-test-001",
|
||||
"time": "2025-12-13T10:00:00Z",
|
||||
"mode": "record",
|
||||
"scannerVersion": "10.2.0"
|
||||
},
|
||||
"subject": {
|
||||
"ociDigest": "sha256:abc123..."
|
||||
},
|
||||
"inputs": {
|
||||
"feeds": [],
|
||||
"tools": []
|
||||
},
|
||||
"reachability": {
|
||||
"graphs": [
|
||||
{
|
||||
"kind": "static",
|
||||
"analyzer": "scanner.java@10.2.0",
|
||||
"hash": "blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2",
|
||||
"hashAlg": "blake3-256",
|
||||
"casUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2"
|
||||
}
|
||||
],
|
||||
"runtimeTraces": [],
|
||||
"code_id_coverage": {
|
||||
"total_nodes": 100,
|
||||
"nodes_with_symbol_id": 100,
|
||||
"nodes_with_code_id": 0,
|
||||
"coverage_percent": 100.0
|
||||
}
|
||||
},
|
||||
"outputs": {},
|
||||
"provenance": {}
|
||||
}
|
||||
```
|
||||
|
||||
**Expected canonical hash:** `sha256:e7f8a9b0...` (computed from canonical JSON)
|
||||
|
||||
### 4.2 Manifest with Runtime Traces
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": "2.0",
|
||||
"scan": {
|
||||
"id": "scan-test-002",
|
||||
"time": "2025-12-13T11:00:00Z",
|
||||
"mode": "record",
|
||||
"scannerVersion": "10.2.0"
|
||||
},
|
||||
"reachability": {
|
||||
"graphs": [
|
||||
{
|
||||
"kind": "static",
|
||||
"analyzer": "scanner.java@10.2.0",
|
||||
"hash": "blake3:1111111111111111111111111111111111111111111111111111111111111111",
|
||||
"hashAlg": "blake3-256",
|
||||
"casUri": "cas://reachability/graphs/blake3:1111111111111111111111111111111111111111111111111111111111111111"
|
||||
}
|
||||
],
|
||||
"runtimeTraces": [
|
||||
{
|
||||
"source": "eventpipe",
|
||||
"hash": "sha256:2222222222222222222222222222222222222222222222222222222222222222",
|
||||
"hashAlg": "sha256",
|
||||
"casUri": "cas://reachability/runtime/sha256:2222222222222222222222222222222222222222222222222222222222222222",
|
||||
"recordedAt": "2025-12-13T10:30:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Sorting Validation Vector
|
||||
|
||||
Input (unsorted):
|
||||
|
||||
```json
|
||||
{
|
||||
"reachability": {
|
||||
"graphs": [
|
||||
{"casUri": "cas://reachability/graphs/blake3:zzzz...", "kind": "framework"},
|
||||
{"casUri": "cas://reachability/graphs/blake3:aaaa...", "kind": "static"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Expected output (sorted):
|
||||
|
||||
```json
|
||||
{
|
||||
"reachability": {
|
||||
"graphs": [
|
||||
{"casUri": "cas://reachability/graphs/blake3:aaaa...", "kind": "static"},
|
||||
{"casUri": "cas://reachability/graphs/blake3:zzzz...", "kind": "framework"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 Invalid Manifest Vectors
|
||||
|
||||
| Test Case | Input | Expected Error |
|
||||
|-----------|-------|----------------|
|
||||
| Missing schemaVersion | `{}` | `REPLAY_MANIFEST_MISSING_VERSION` |
|
||||
| Invalid version | `{"schemaVersion": "1.0"}` | `REPLAY_MANIFEST_VERSION_MISMATCH` (when v2 required) |
|
||||
| Missing hashAlg | `{"hash": "blake3:..."}` | `REPLAY_MANIFEST_MISSING_HASH_ALG` |
|
||||
| Unsorted graphs | See 4.3 input | `REPLAY_MANIFEST_UNSORTED_ENTRIES` |
|
||||
| Missing CAS reference | `{"casUri": "cas://missing/..."}` | `REPLAY_MANIFEST_CAS_NOT_FOUND` |
|
||||
| Hash mismatch | CAS content differs | `REPLAY_MANIFEST_HASH_MISMATCH` |
|
||||
|
||||
---
|
||||
|
||||
## 5. Migration Path
|
||||
|
||||
### 5.1 v1 → v2 Upgrade
|
||||
|
||||
```csharp
|
||||
public static ReplayManifest UpgradeToV2(ReplayManifest v1)
|
||||
{
|
||||
return v1 with
|
||||
{
|
||||
SchemaVersion = "2.0",
|
||||
Reachability = v1.Reachability with
|
||||
{
|
||||
Graphs = v1.Reachability.Graphs
|
||||
.Select(g => g with { HashAlg = InferHashAlg(g.Hash) })
|
||||
.OrderBy(g => g.CasUri)
|
||||
.ToList(),
|
||||
RuntimeTraces = v1.Reachability.RuntimeTraces
|
||||
.Select(t => t with { HashAlg = InferHashAlg(t.Hash) })
|
||||
.OrderBy(t => t.CasUri)
|
||||
.ToList()
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Backward Compatibility
|
||||
|
||||
- v2 readers MUST accept v1 manifests with warning
|
||||
- v2 writers MUST always emit v2 format
|
||||
- v1 writers deprecated after 2026-03-01
|
||||
|
||||
---
|
||||
|
||||
## 6. Test Fixture Locations
|
||||
|
||||
```
|
||||
tests/Replay/
|
||||
fixtures/
|
||||
manifest-v2-minimal.json
|
||||
manifest-v2-with-runtime.json
|
||||
manifest-v2-sorted.json
|
||||
manifest-v2-code-id-coverage.json
|
||||
invalid/
|
||||
manifest-missing-version.json
|
||||
manifest-unsorted.json
|
||||
manifest-missing-hashalg.json
|
||||
golden/
|
||||
manifest-v2-canonical.golden.json
|
||||
manifest-v2-hash.golden.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Checklist
|
||||
|
||||
- [ ] Update `ReplayManifest` record with v2 fields
|
||||
- [ ] Add `hashAlg` to all hash-bearing types
|
||||
- [ ] Implement sorting in `ReachabilityReplayWriter`
|
||||
- [ ] Add CAS registration validation
|
||||
- [ ] Create test fixtures
|
||||
- [ ] Update `DETERMINISTIC_REPLAY.md` section 3
|
||||
- [ ] Wire into RecordModeService
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2025-12-13. See Sprint 0401 GAP-REP-004 for implementation._
|
||||
@@ -224,5 +224,103 @@ Extended schema with tier information:
|
||||
|
||||
- **Tier calculation:** `UncertaintyTierCalculator` in `src/Signals/StellaOps.Signals/Services/`
|
||||
- **Risk score math:** `ReachabilityScoringService.ComputeRiskScore()` (extend existing)
|
||||
- **Policy integration:** `docs/reachability/policy-gate.md` for gate rules
|
||||
- **Policy integration:** `docs/policy/dsl.md` §12 for uncertainty gates
|
||||
- **Lattice integration:** `docs/reachability/lattice.md` §9 for v1 lattice states
|
||||
|
||||
---
|
||||
|
||||
## 8. Policy Guidance (v1 — Sprint 0401)
|
||||
|
||||
Uncertainty gates enforce evidence-quality thresholds in the Policy Engine. When entropy is too high or evidence is missing, policies block or downgrade VEX decisions.
|
||||
|
||||
### 8.1 Gate Mapping
|
||||
|
||||
| Gate | Uncertainty State | Tier | Policy Action |
|
||||
|------|------------------|------|---------------|
|
||||
| `U1` | `MissingSymbolResolution` | T1/T2 | Block `not_affected`, require review |
|
||||
| `U2` | `MissingPurl` | T2/T3 | Warn on `not_affected`, add review flag |
|
||||
| `U3` | `UntrustedAdvisory` | T3/T4 | Advisory caveat, no blocking |
|
||||
|
||||
### 8.2 Sample Policy Rules
|
||||
|
||||
```dsl
|
||||
// Block not_affected when symbol resolution has high entropy
|
||||
rule u1_gate_high_entropy priority 5 {
|
||||
when signals.uncertainty.level == "U1"
|
||||
and signals.uncertainty.entropy >= 0.7
|
||||
then status := "under_investigation"
|
||||
annotate gate := "U1"
|
||||
annotate remediation := "Upload symbols or close unknowns registry"
|
||||
because "High symbol entropy blocks strong VEX claims";
|
||||
}
|
||||
|
||||
// Tier-based compound gate
|
||||
rule tier1_block_not_affected priority 3 {
|
||||
when signals.uncertainty.aggregateTier == "T1"
|
||||
and vex.any(status == "not_affected")
|
||||
then status := "under_investigation"
|
||||
annotate blocked_reason := "T1 uncertainty requires evidence"
|
||||
because "Maximum uncertainty tier blocks all exclusion claims";
|
||||
}
|
||||
```
|
||||
|
||||
### 8.3 YAML Configuration
|
||||
|
||||
```yaml
|
||||
uncertainty_gates:
|
||||
u1_gate:
|
||||
entropy_threshold: 0.7
|
||||
blocked_statuses: [not_affected]
|
||||
fallback_status: under_investigation
|
||||
remediation_hint: "Upload symbols or resolve unknowns"
|
||||
u2_gate:
|
||||
entropy_threshold: 0.4
|
||||
blocked_statuses: [not_affected]
|
||||
warn_on_block: true
|
||||
u3_gate:
|
||||
entropy_threshold: 0.1
|
||||
annotate_caveat: true
|
||||
```
|
||||
|
||||
See `docs/policy/dsl.md` §12 for complete gate rules and tier-aware compound patterns.
|
||||
|
||||
---
|
||||
|
||||
## 9. Remediation Actions
|
||||
|
||||
Each uncertainty state has recommended remediation steps:
|
||||
|
||||
| State | Code | Remediation | CLI Command |
|
||||
|-------|------|-------------|-------------|
|
||||
| MissingSymbolResolution | `U1` | Upload debug symbols, resolve unknowns | `stella symbols ingest --path <symbols>` |
|
||||
| MissingPurl | `U2` | Generate lockfile, verify package coordinates | `stella sbom refresh --resolve` |
|
||||
| UntrustedAdvisory | `U3` | Cross-reference trusted sources | `stella advisory verify --source NVD,GHSA` |
|
||||
| Unknown | `U4` | Run initial analysis | `stella scan --full` |
|
||||
|
||||
### 9.1 Automated Remediation Flow
|
||||
|
||||
```
|
||||
1. Policy blocks decision with U1/U2 gate
|
||||
↓
|
||||
2. Console/CLI shows remediation hint
|
||||
↓
|
||||
3. User runs remediation command (e.g., stella symbols ingest)
|
||||
↓
|
||||
4. Signals recomputes uncertainty states
|
||||
↓
|
||||
5. Risk score updates, tier may drop
|
||||
↓
|
||||
6. Policy re-evaluates, decision may proceed
|
||||
```
|
||||
|
||||
### 9.2 Remediation Priority
|
||||
|
||||
When multiple uncertainty states exist, prioritize by tier:
|
||||
|
||||
1. **T1 states first** — Block all exclusions until resolved
|
||||
2. **T2 states** — May proceed with warnings if T1 cleared
|
||||
3. **T3/T4 states** — Normal flow with caveats
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-12-13 (Sprint 0401).*
|
||||
|
||||
Reference in New Issue
Block a user