feat: Add Promotion-Time Attestations for Stella Ops

- Introduced a new document for promotion-time attestations, detailing the purpose, predicate schema, producer workflow, verification flow, APIs, and security considerations. - Implemented the `stella.ops/promotion@v1` predicate schema to capture promotion evidence including image digest, SBOM/VEX artifacts, and Rekor proof. - Defined producer responsibilities and workflows for CLI orchestration, signer responsibilities, and Export Center integration. - Added verification steps for auditors to validate promotion attestations offline. feat: Create Symbol Manifest v1 Specification - Developed a specification for Symbol Manifest v1 to provide a deterministic format for publishing debug symbols and source maps. - Defined the manifest structure, including schema, entries, source maps, toolchain, and provenance. - Outlined upload and verification processes, resolve APIs, runtime proxy, caching, and offline bundle generation. - Included security considerations and related tasks for implementation. chore: Add Ruby Analyzer with Git Sources - Created a Gemfile and Gemfile.lock for Ruby analyzer with dependencies on git-gem, httparty, and path-gem. - Implemented main application logic to utilize the defined gems and output their versions. - Added expected JSON output for the Ruby analyzer to validate the integration of the new gems and their functionalities. - Developed internal observation classes for Ruby packages, runtime edges, and capabilities, including serialization logic for observations. test: Add tests for Ruby Analyzer - Created test fixtures for Ruby analyzer, including Gemfile, Gemfile.lock, main application, and expected JSON output. - Ensured that the tests validate the correct integration and functionality of the Ruby analyzer with the specified gems.
2025-11-11 15:30:22 +02:00
parent 56c687253f
commit c2c6b58b41
56 changed files with 2305 additions and 198 deletions
--- a/docs/modules/policy/README.md
+++ b/docs/modules/policy/README.md
@@ -23,6 +23,7 @@ Policy Engine compiles and evaluates Stella DSL policies deterministically, prod
 - Governance and scope mapping in ../../security/policy-governance.md.
 - Readiness briefs: ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
 - Readiness briefs: ../scanner/design/macos-analyzer.md, ../scanner/design/windows-analyzer.md, ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
+- Ruby capability predicates design: ./design/ruby-capability-predicates.md.

 ## Backlog references
 - DOCS-POLICY-20-001 … DOCS-POLICY-20-012 (completed baseline).
--- a/docs/modules/policy/TASKS.md
+++ b/docs/modules/policy/TASKS.md
@@ -0,0 +1,5 @@
+# Policy Engine Guild — Active Tasks
+
+| Task ID | State | Notes |
+| --- | --- | --- |
+| `SCANNER-POLICY-0001` | DONE (2025-11-10) | Ruby component predicates implemented in engine/tests, DSL docs updated, offline kit verifies `seed-data/analyzers/ruby/git-sources`. |
--- a/docs/modules/policy/design/ruby-capability-predicates.md
+++ b/docs/modules/policy/design/ruby-capability-predicates.md
@@ -0,0 +1,82 @@
+# Ruby Capability & Source Predicates (SCANNER-POLICY-0001)
+
+**Status:** Implemented · Owner: Policy Guild · Updated: 2025-11-10  
+**Scope:** Extend Policy Engine DSL to consume Ruby analyzer metadata (`groups`, `declaredOnly`, capabilities, git/path provenance) emitted in Sprint 138.
+
+---
+
+## 1. Goals
+
+1. Allow policies to express intent around Bundler groups (e.g., blocking `development` gems in production promotes).
+2. Expose Ruby capability evidence (exec/net/serialization/job schedulers) as first-class predicates.
+3. Differentiate package provenance: registry, git, path/vendor cache.
+4. Ensure new predicates work in offline/air-gapped evaluation and export deterministically.
+
+Non-goals: UI wiring (handled by Policy Studio team), policy templates rollout (tracked separately in DOCS-POLICY backlog).
+
+## 2. Source Metadata
+
+Scanner now emits the following fields per Ruby component:
+
+| Field | Type | Example | Notes |
+|-------|------|---------|-------|
+| `groups` | `string` (semi-colon list) | `development;test` | Aggregated from manifest + lockfile. |
+| `declaredOnly` | `bool` (string `"true"/"false"`) | `"false"` | False indicates vendor cache evidence present. |
+| `source` | `string` | `git:https://github.com/example/git-gem.git@<rev>` | Registry (`https://`), `git:`, `path:`, `vendor-cache`. |
+| `artifact` | `string?` | `vendor/cache/path-gem-2.1.3.gem` | Only when cached artefact observed. |
+| Capability flags | `string -> bool` | `capability.exec = "true"` etc. | Includes scheduler sub-keys. |
+
+## 3. Proposed Predicates
+
+| Predicate | Signature | Description |
+|-----------|-----------|-------------|
+| `ruby.group(name: string)` | `bool` | True if component belongs to Bundler group `name`. |
+| `ruby.groups()` | `set<string>` | Returns all groups for aggregations. |
+| `ruby.declared_only()` | `bool` | True when component has no vendor/installed evidence. |
+| `ruby.source(kind?: string)` | `bool` | Kind matches prefix (`registry`, `git`, `path`, `vendor-cache`). |
+| `ruby.capability(name: string)` | `bool` | Supported names: `exec`, `net`, `serialization`, `scheduler`, scheduler subtypes (`scheduler.activejob`, etc.). |
+| `ruby.capability_any(names: set<string>)` | `bool` | Utility predicate to check multiple capabilities. |
+
+Implementation detail: compile-time validation ensures predicate usage only within Ruby component scope (similar to `node.group` pattern).
+
+## 4. DSL & Engine Changes
+
+1. **Schema mapping:** Update `ComponentFacts` model to surface new Ruby metadata in evaluation context.
+2. **Predicate registry:** Add Ruby-specific predicate handlers to `PolicyPredicateRegistry` with deterministic ordering.
+3. **Explain traces:** Include matched predicates + metadata in explain output.
+4. **Exports:** Ensure Offline Kit bundles include updated predicate metadata (no runtime fetch).
+
+## 5. Policy Templates (follow-up)
+
+Create sample rules under `policy/templates/ruby`:
+
+- Block `ruby.group("development")` when `promotion.target == "prod"`.
+- Flag `ruby.capability("exec")` components unless allowlisted.
+- Require `ruby.source("git")` packages to provide pinned hash allowlists.
+
+Tracking: DOCS-POLICY follow-up (not part of SCANNER-POLICY-0001 initial kick-off).
+
+## 6. Testing Strategy
+
+- Unit tests for each predicate (true/false cases, unsupported values).
+- Integration test tying sample Scanner payload to simulated policy evaluation.
+- Determinism run: repeated evaluation with same snapshot must yield identical explain trace hash.
+- Offline regression: ensure `seed-data/analyzers/ruby/git-sources` fixture flows through offline-kit policy evaluation script.
+
+## 7. Timeline & Dependencies
+
+| Step | Owner | Target |
+|------|-------|--------|
+| Predicate implementation + tests | Policy Engine Guild | Sprint 138 (in progress) |
+| Offline kit regression update | Policy + Ops | Sprint 138 |
+| Policy templates & docs | Docs Guild | Sprint 139 |
+
+Dependencies: Scanner metadata in place (SCANNER-ENG-0016 DONE); no additional service contracts required.
+
+## 8. Open Questions
+
+1. Should `declaredOnly` interact with existing waiver semantics (e.g., treat as lower severity)? → Needs risk review.
+2. Do we expose scheduler sub-types individually or aggregate under `ruby.capability("scheduler")` only? → Proposed to expose both for flexibility.
+3. Is git URL normalization required (strip credentials, hash fragments)? → Ensure sanitization before evaluation.
+
+Please comment in `docs/modules/policy/design/ruby-capability-predicates.md` or via SCANNER-POLICY-0001 sprint entry.
--- a/docs/modules/scanner/design/ruby-analyzer.md
+++ b/docs/modules/scanner/design/ruby-analyzer.md
@@ -1,6 +1,6 @@
 # Ruby Analyzer Parity Design (SCANNER-ENG-0009)

-**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
+**Status:** Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10

 ## 1. Goals & Non-Goals
 - **Goals**
@@ -70,10 +70,9 @@
 ### 4.4 Runtime Graph Builder
 - Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
 - Implementation phases:
-  1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
-  2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
-  3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
+  1. **MVP (shipped in Sprint 138):** perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit `require*` and `autoload` statements, records referencing files, and links back to packages when a matching lock entry exists.
+  2. **Planned follow-up:** integrate tree-sitter Ruby under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` for full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
+- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.

 ### 4.5 Capability & Surface Signals
 - Emit evidence documents for:
@@ -95,11 +94,13 @@
 | `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
 | `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
 | `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
+| `ruby_observation.json` | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |

 All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.

 ## 6. Testing Strategy
 - **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
+- **Fixtures**: Added `git-sources` scenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling.
 - **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
 - **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
 - **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
@@ -121,15 +122,15 @@ All records follow AOC appender rules (immutable, tenant-scoped) and include `ha
 - Need alignment with Export Center on Ruby-specific manifest emissions.

 ## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
+- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10).
 - **Obligations**:
-  1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
-  2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
-  3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
-  4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
+  1. Keep MIT license texts in `/third-party-licenses/` and ship them with Offline Kits (fulfilled via `build_offline_kit.py` copying the directory into staging).
+  2. Track acknowledgements in `NOTICE.md` (completed).
+  3. Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does **not** bundle tree-sitter artifacts yet, so no generated sources are redistributed.
+  4. When tree-sitter integration lands, ensure `tree-sitter-cli` remains a build-time tool only.
 - **Deliverables**:
-  - SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
-  - Export Center to mirror license files into Offline Kit bundle.
+  - SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors `third-party-licenses/`.
+  - Export centre recipe inherits the copied directory with deterministic hashing.

 ---
 *References:*
--- a/docs/modules/scanner/determinism-score.md
+++ b/docs/modules/scanner/determinism-score.md
@@ -0,0 +1,87 @@
+# Scanner Determinism Score Guide
+
+> **Status:** Draft – Sprint 186/202/203  
+> **Owners:** Scanner Guild · QA Guild · DevEx/CLI Guild · DevOps Guild
+
+## 1. Goal
+
+Quantify how repeatable a scanner release is by re-running scans under frozen conditions and reporting the ratio of bit-for-bit identical outputs. The determinism score lets customers and auditors confirm that Stella Ops scans are replayable and trustworthy.
+
+## 2. Test harness overview (`SCAN-DETER-186-009`)
+
+1. **Inputs:** image digests, policy bundle SHA, feed snapshot SHA, scanner container digest, platform (linux/amd64 by default).
+2. **Execution loop:** run the scanner *N* times (default 10) with:
+   * `--fixed-clock <timestamp>`
+   * `RNG_SEED=1337`
+   * `SCANNER_MAX_CONCURRENCY=1`
+   * feeds/policy tarballs mounted read-only
+   * `--network=none`, `--cpuset-cpus=0`, `--memory=2G`
+3. **Canonicalisation:** normalise JSON outputs (SBOM, VEX, findings, logs) using the same serializer as production (`StellaOps.Scanner.Replay` helpers).
+4. **Hashing:** compute SHA-256 for each canonical artefact per run.
+5. **Score calculation:** `identical_runs / total_runs` (per image and overall). A run is “identical” if all artefact hashes match the baseline (run 1).
+
+The harness persists the full run set under CAS, allowing regression tests and Offline kit inclusion.
+
+## 3. Output artefacts (`SCAN-DETER-186-010`)
+
+* `determinism.json` – per-image runs, identical counts, score, policy/feed hashes.
+* `run_i/*.json` – canonicalised outputs for debugging.
+* `diffs/` – optional diff samples when runs diverge.
+
+Example `determinism.json`:
+
+```json
+{
+  "release": "scanner-0.14.3",
+  "platform": "linux/amd64",
+  "policy_sha": "a1b2c3…",
+  "feeds_sha": "d4e5f6…",
+  "images": [
+    {
+      "digest": "sha256:abc…",
+      "runs": 10,
+      "identical": 10,
+      "score": 1.0,
+      "artifact_hashes": {
+        "sbom.cdx.json": "sha256:11…",
+        "vex.json": "sha256:22…",
+        "findings.json": "sha256:33…"
+      }
+    }
+  ],
+  "overall_score": 1.0
+}
+```
+
+## 4. CI integration (`DEVOPS-SCAN-90-004`)
+
+* GitHub/Gitea pipeline stages run the determinism harness for the release matrix.
+* Fail the job when `overall_score < threshold` (default 0.95) or any image falls below 0.90.
+* Upload `determinism.json` and artefacts as build outputs; attach to release notes and Offline kits.
+
+## 5. CLI support (`CLI-DETER-70-003/004`)
+
+* `stella detscore run` – executes the harness locally, honoring the same frozen-clock and seed settings; exits non-zero when score falls below the configured threshold.
+* `stella detscore report` – summarises one or more `determinism.json` files for release notes, showing per-image scores and detection of non-deterministic artefacts.
+
+## 6. Policy & UI consumption
+
+* Policy Engine can enforce determinism thresholds (e.g., block promotion if score < 0.95) using the `determinism.json` evidence.
+* UI surfaces the score alongside scans (e.g., badge in scan detail view) referencing task `UI-SBOM-DET-01`.
+
+## 7. Evidence & replay
+
+* Include `determinism.json` and canonical run outputs in Replay bundles (`docs/replay/DETERMINISTIC_REPLAY.md`).
+* DSSE-sign determinism results before adding them to Evidence Locker.
+
+## 8. Implementation checklist
+
+| Area | Task ID | Notes |
+|------|---------|-------|
+| Harness | `SCAN-DETER-186-009` | Deterministic execution + hashing |
+| Artefacts | `SCAN-DETER-186-010` | Publish JSON, CAS storage |
+| CLI | `CLI-DETER-70-003/004` | Local runs + reporting |
+| DevOps | `DEVOPS-SCAN-90-004` | CI enforcement |
+| Docs | `DOCS-DETER-70-002` | (this document) |
+
+Update this guide with links to code once tasks move to **DONE**.
--- a/docs/modules/scanner/entropy.md
+++ b/docs/modules/scanner/entropy.md
@@ -0,0 +1,126 @@
+# Entropy Analysis for Executable Layers
+
+> **Status:** Draft – Sprint 186/209  
+> **Owners:** Scanner Guild · Policy Guild · UI Guild · Docs Guild
+
+## 1. Overview
+
+Entropy analysis highlights opaque regions inside container layers (packed binaries, stripped blobs, embedded firmware) so Stella Ops can prioritise artefacts that are hard to audit. The scanner computes per-file entropy metrics, reports opaque ratios per layer, and feeds penalties into the trust algebra.
+
+## 2. Scanner pipeline (`SCAN-ENTROPY-186-011/012`)
+
+* **Target files:** ELF, PE/COFF, Mach-O executables and large raw blobs (>16 KB). Archive formats (zip/tar) are unpacked by existing analyzers before entropy processing.
+* **Section analysis:**  
+  * ELF – `.text`, `.rodata`, `.data`, custom sections.  
+  * PE – section table entries (`IMAGE_SECTION_HEADER`).  
+  * Mach-O – LC_SEGMENT/LC_SEGMENT_64 sections.
+* **Sliding window:** 4 KB window with 1 KB stride. Entropy calculated using Shannon entropy:
+
+  \[
+  H = -\sum_{i=0}^{255} p_i \log_2 p_i
+  \]
+
+  Windows with `H ≥ 7.2` bits/byte are marked “opaque”.
+* **Heuristics & hints:**
+  * Flag entire files with no symbols or stripped debug info.
+  * Detect known packer section names (`.UPX*`, `.aspack`, etc.).
+  * Record offsets, window sizes, and entropy values to support explainability.
+* **Outputs:**
+  * `entropy.report.json` (per-file details, windows, hints).
+  * `layer_summary.json` (opaque byte ratios per layer and overall image).
+  * Penalty score contributed to the trust algebra (`entropy_penalty`).
+
+All JSON output is canonical (sorted keys, UTF-8) and included in DSSE attestations/replay bundles.
+
+## 3. JSON Schemas
+
+### 3.1 `entropy.report.json`
+
+```jsonc
+{
+  "schema": "stellaops.entropy/report@1",
+  "imageDigest": "sha256:…",
+  "layerDigest": "sha256:…",
+  "files": [
+    {
+      "path": "/opt/app/libblob.so",
+      "size": 5242880,
+      "opaqueBytes": 1342177,
+      "opaqueRatio": 0.25,
+      "flags": ["stripped", "section:.UPX0"],
+      "windows": [
+        { "offset": 0, "length": 4096, "entropy": 7.45 },
+        { "offset": 1024, "length": 4096, "entropy": 7.38 }
+      ]
+    }
+  ]
+}
+```
+
+### 3.2 `layer_summary.json`
+
+```jsonc
+{
+  "schema": "stellaops.entropy/layer-summary@1",
+  "imageDigest": "sha256:…",
+  "layers": [
+    {
+      "digest": "sha256:layer4…",
+      "opaqueBytes": 2306867,
+      "totalBytes": 10485760,
+      "opaqueRatio": 0.22,
+      "indicators": ["packed", "no-symbols"]
+    }
+  ],
+  "imageOpaqueRatio": 0.18,
+  "entropyPenalty": 0.12
+}
+```
+
+## 4. Policy integration (`POLICY-RISK-90-001`)
+
+* Policy Engine receives `entropy_penalty` and per-layer ratios via scan evidence.
+* Default thresholds:
+  * Block when `imageOpaqueRatio > 0.15` and provenance unknown.
+  * Warn when any executable has `opaqueRatio > 0.30`.
+* Penalty weights are configurable per tenant. Policy explanations include:
+  * Highest-entropy files and offsets.
+  * Reason code (packed, no symbols, runtime reachable).
+
+## 5. UI experience (`UI-ENTROPY-40-001/002`)
+
+* **Heatmaps:** render entropy along the file timeline (green → red).
+* **Layer donut:** show opaque % per layer with tooltips linking to file list.
+* **“Why risky?” chips:** highlight triggers such as *Packed-like*, *Stripped*, *No symbols*.
+* Policy banners explain configured thresholds and mitigation (add provenance, unpack, or accept risk).
+* Provide direct download links to `entropy.report.json` for audits.
+
+## 6. CLI / API hooks
+
+* CLI – `stella scan artifacts --entropy` option prints top opaque files and penalties.
+* API – `GET /api/v1/scans/{id}/entropy` serves summary + evidence references.
+* Notify templates can include entropy penalties to escalate opaque images.
+
+## 7. Trust algebra
+
+The penalty is computed as:
+
+\[
+\text{entropyPenalty} = K \sum_{\text{layers}} \left( \frac{\text{opaqueBytes}}{\text{totalBytes}} \times \frac{\text{layerBytes}}{\text{imageBytes}} \right)
+\]
+
+* Default `K = 0.5`.
+* Cap penalty at 0.3 to avoid over-weighting tiny blobs.
+* Combine with other trust signals (reachability, provenance) to prioritise audits.
+
+## 8. Implementation checklist
+
+| Area | Task ID | Notes |
+|------|---------|-------|
+| Scanner analysis | `SCAN-ENTROPY-186-011` | Sliding window entropy & heuristics |
+| Evidence output | `SCAN-ENTROPY-186-012` | JSON reports + DSSE |
+| Policy integration | `POLICY-RISK-90-001` | Trust weight + explanations |
+| UI | `UI-ENTROPY-40-001/002` | Visualisation & messaging |
+| Docs | `DOCS-ENTROPY-70-004` | (this guide) |
+
+Update this document as thresholds change or additional packer signatures are introduced.