feat: Add Promotion-Time Attestations for Stella Ops

- Introduced a new document for promotion-time attestations, detailing the purpose, predicate schema, producer workflow, verification flow, APIs, and security considerations. - Implemented the `stella.ops/promotion@v1` predicate schema to capture promotion evidence including image digest, SBOM/VEX artifacts, and Rekor proof. - Defined producer responsibilities and workflows for CLI orchestration, signer responsibilities, and Export Center integration. - Added verification steps for auditors to validate promotion attestations offline. feat: Create Symbol Manifest v1 Specification - Developed a specification for Symbol Manifest v1 to provide a deterministic format for publishing debug symbols and source maps. - Defined the manifest structure, including schema, entries, source maps, toolchain, and provenance. - Outlined upload and verification processes, resolve APIs, runtime proxy, caching, and offline bundle generation. - Included security considerations and related tasks for implementation. chore: Add Ruby Analyzer with Git Sources - Created a Gemfile and Gemfile.lock for Ruby analyzer with dependencies on git-gem, httparty, and path-gem. - Implemented main application logic to utilize the defined gems and output their versions. - Added expected JSON output for the Ruby analyzer to validate the integration of the new gems and their functionalities. - Developed internal observation classes for Ruby packages, runtime edges, and capabilities, including serialization logic for observations. test: Add tests for Ruby Analyzer - Created test fixtures for Ruby analyzer, including Gemfile, Gemfile.lock, main application, and expected JSON output. - Ensured that the tests validate the correct integration and functionality of the Ruby analyzer with the specified gems.
2025-11-11 15:30:22 +02:00
parent 56c687253f
commit c2c6b58b41
56 changed files with 2305 additions and 198 deletions
--- a/docs/modules/scanner/design/ruby-analyzer.md
+++ b/docs/modules/scanner/design/ruby-analyzer.md
@@ -1,6 +1,6 @@
 # Ruby Analyzer Parity Design (SCANNER-ENG-0009)

-**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
+**Status:** Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10

 ## 1. Goals & Non-Goals
 - **Goals**
@@ -70,10 +70,9 @@
 ### 4.4 Runtime Graph Builder
 - Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
 - Implementation phases:
-  1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
-  2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
-  3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
+  1. **MVP (shipped in Sprint 138):** perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit `require*` and `autoload` statements, records referencing files, and links back to packages when a matching lock entry exists.
+  2. **Planned follow-up:** integrate tree-sitter Ruby under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` for full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
+- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.

 ### 4.5 Capability & Surface Signals
 - Emit evidence documents for:
@@ -95,11 +94,13 @@
 | `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
 | `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
 | `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
+| `ruby_observation.json` | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |

 All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.

 ## 6. Testing Strategy
 - **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
+- **Fixtures**: Added `git-sources` scenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling.
 - **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
 - **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
 - **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
@@ -121,15 +122,15 @@ All records follow AOC appender rules (immutable, tenant-scoped) and include `ha
 - Need alignment with Export Center on Ruby-specific manifest emissions.

 ## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
+- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10).
 - **Obligations**:
-  1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
-  2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
-  3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
-  4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
+  1. Keep MIT license texts in `/third-party-licenses/` and ship them with Offline Kits (fulfilled via `build_offline_kit.py` copying the directory into staging).
+  2. Track acknowledgements in `NOTICE.md` (completed).
+  3. Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does **not** bundle tree-sitter artifacts yet, so no generated sources are redistributed.
+  4. When tree-sitter integration lands, ensure `tree-sitter-cli` remains a build-time tool only.
 - **Deliverables**:
-  - SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
-  - Export Center to mirror license files into Offline Kit bundle.
+  - SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors `third-party-licenses/`.
+  - Export centre recipe inherits the copied directory with deterministic hashing.

 ---
 *References:*
--- a/docs/modules/scanner/determinism-score.md
+++ b/docs/modules/scanner/determinism-score.md
@@ -0,0 +1,87 @@
+# Scanner Determinism Score Guide
+
+> **Status:** Draft – Sprint 186/202/203  
+> **Owners:** Scanner Guild · QA Guild · DevEx/CLI Guild · DevOps Guild
+
+## 1. Goal
+
+Quantify how repeatable a scanner release is by re-running scans under frozen conditions and reporting the ratio of bit-for-bit identical outputs. The determinism score lets customers and auditors confirm that Stella Ops scans are replayable and trustworthy.
+
+## 2. Test harness overview (`SCAN-DETER-186-009`)
+
+1. **Inputs:** image digests, policy bundle SHA, feed snapshot SHA, scanner container digest, platform (linux/amd64 by default).
+2. **Execution loop:** run the scanner *N* times (default 10) with:
+   * `--fixed-clock <timestamp>`
+   * `RNG_SEED=1337`
+   * `SCANNER_MAX_CONCURRENCY=1`
+   * feeds/policy tarballs mounted read-only
+   * `--network=none`, `--cpuset-cpus=0`, `--memory=2G`
+3. **Canonicalisation:** normalise JSON outputs (SBOM, VEX, findings, logs) using the same serializer as production (`StellaOps.Scanner.Replay` helpers).
+4. **Hashing:** compute SHA-256 for each canonical artefact per run.
+5. **Score calculation:** `identical_runs / total_runs` (per image and overall). A run is “identical” if all artefact hashes match the baseline (run 1).
+
+The harness persists the full run set under CAS, allowing regression tests and Offline kit inclusion.
+
+## 3. Output artefacts (`SCAN-DETER-186-010`)
+
+* `determinism.json` – per-image runs, identical counts, score, policy/feed hashes.
+* `run_i/*.json` – canonicalised outputs for debugging.
+* `diffs/` – optional diff samples when runs diverge.
+
+Example `determinism.json`:
+
+```json
+{
+  "release": "scanner-0.14.3",
+  "platform": "linux/amd64",
+  "policy_sha": "a1b2c3…",
+  "feeds_sha": "d4e5f6…",
+  "images": [
+    {
+      "digest": "sha256:abc…",
+      "runs": 10,
+      "identical": 10,
+      "score": 1.0,
+      "artifact_hashes": {
+        "sbom.cdx.json": "sha256:11…",
+        "vex.json": "sha256:22…",
+        "findings.json": "sha256:33…"
+      }
+    }
+  ],
+  "overall_score": 1.0
+}
+```
+
+## 4. CI integration (`DEVOPS-SCAN-90-004`)
+
+* GitHub/Gitea pipeline stages run the determinism harness for the release matrix.
+* Fail the job when `overall_score < threshold` (default 0.95) or any image falls below 0.90.
+* Upload `determinism.json` and artefacts as build outputs; attach to release notes and Offline kits.
+
+## 5. CLI support (`CLI-DETER-70-003/004`)
+
+* `stella detscore run` – executes the harness locally, honoring the same frozen-clock and seed settings; exits non-zero when score falls below the configured threshold.
+* `stella detscore report` – summarises one or more `determinism.json` files for release notes, showing per-image scores and detection of non-deterministic artefacts.
+
+## 6. Policy & UI consumption
+
+* Policy Engine can enforce determinism thresholds (e.g., block promotion if score < 0.95) using the `determinism.json` evidence.
+* UI surfaces the score alongside scans (e.g., badge in scan detail view) referencing task `UI-SBOM-DET-01`.
+
+## 7. Evidence & replay
+
+* Include `determinism.json` and canonical run outputs in Replay bundles (`docs/replay/DETERMINISTIC_REPLAY.md`).
+* DSSE-sign determinism results before adding them to Evidence Locker.
+
+## 8. Implementation checklist
+
+| Area | Task ID | Notes |
+|------|---------|-------|
+| Harness | `SCAN-DETER-186-009` | Deterministic execution + hashing |
+| Artefacts | `SCAN-DETER-186-010` | Publish JSON, CAS storage |
+| CLI | `CLI-DETER-70-003/004` | Local runs + reporting |
+| DevOps | `DEVOPS-SCAN-90-004` | CI enforcement |
+| Docs | `DOCS-DETER-70-002` | (this document) |
+
+Update this guide with links to code once tasks move to **DONE**.
--- a/docs/modules/scanner/entropy.md
+++ b/docs/modules/scanner/entropy.md
@@ -0,0 +1,126 @@
+# Entropy Analysis for Executable Layers
+
+> **Status:** Draft – Sprint 186/209  
+> **Owners:** Scanner Guild · Policy Guild · UI Guild · Docs Guild
+
+## 1. Overview
+
+Entropy analysis highlights opaque regions inside container layers (packed binaries, stripped blobs, embedded firmware) so Stella Ops can prioritise artefacts that are hard to audit. The scanner computes per-file entropy metrics, reports opaque ratios per layer, and feeds penalties into the trust algebra.
+
+## 2. Scanner pipeline (`SCAN-ENTROPY-186-011/012`)
+
+* **Target files:** ELF, PE/COFF, Mach-O executables and large raw blobs (>16 KB). Archive formats (zip/tar) are unpacked by existing analyzers before entropy processing.
+* **Section analysis:**  
+  * ELF – `.text`, `.rodata`, `.data`, custom sections.  
+  * PE – section table entries (`IMAGE_SECTION_HEADER`).  
+  * Mach-O – LC_SEGMENT/LC_SEGMENT_64 sections.
+* **Sliding window:** 4 KB window with 1 KB stride. Entropy calculated using Shannon entropy:
+
+  \[
+  H = -\sum_{i=0}^{255} p_i \log_2 p_i
+  \]
+
+  Windows with `H ≥ 7.2` bits/byte are marked “opaque”.
+* **Heuristics & hints:**
+  * Flag entire files with no symbols or stripped debug info.
+  * Detect known packer section names (`.UPX*`, `.aspack`, etc.).
+  * Record offsets, window sizes, and entropy values to support explainability.
+* **Outputs:**
+  * `entropy.report.json` (per-file details, windows, hints).
+  * `layer_summary.json` (opaque byte ratios per layer and overall image).
+  * Penalty score contributed to the trust algebra (`entropy_penalty`).
+
+All JSON output is canonical (sorted keys, UTF-8) and included in DSSE attestations/replay bundles.
+
+## 3. JSON Schemas
+
+### 3.1 `entropy.report.json`
+
+```jsonc
+{
+  "schema": "stellaops.entropy/report@1",
+  "imageDigest": "sha256:…",
+  "layerDigest": "sha256:…",
+  "files": [
+    {
+      "path": "/opt/app/libblob.so",
+      "size": 5242880,
+      "opaqueBytes": 1342177,
+      "opaqueRatio": 0.25,
+      "flags": ["stripped", "section:.UPX0"],
+      "windows": [
+        { "offset": 0, "length": 4096, "entropy": 7.45 },
+        { "offset": 1024, "length": 4096, "entropy": 7.38 }
+      ]
+    }
+  ]
+}
+```
+
+### 3.2 `layer_summary.json`
+
+```jsonc
+{
+  "schema": "stellaops.entropy/layer-summary@1",
+  "imageDigest": "sha256:…",
+  "layers": [
+    {
+      "digest": "sha256:layer4…",
+      "opaqueBytes": 2306867,
+      "totalBytes": 10485760,
+      "opaqueRatio": 0.22,
+      "indicators": ["packed", "no-symbols"]
+    }
+  ],
+  "imageOpaqueRatio": 0.18,
+  "entropyPenalty": 0.12
+}
+```
+
+## 4. Policy integration (`POLICY-RISK-90-001`)
+
+* Policy Engine receives `entropy_penalty` and per-layer ratios via scan evidence.
+* Default thresholds:
+  * Block when `imageOpaqueRatio > 0.15` and provenance unknown.
+  * Warn when any executable has `opaqueRatio > 0.30`.
+* Penalty weights are configurable per tenant. Policy explanations include:
+  * Highest-entropy files and offsets.
+  * Reason code (packed, no symbols, runtime reachable).
+
+## 5. UI experience (`UI-ENTROPY-40-001/002`)
+
+* **Heatmaps:** render entropy along the file timeline (green → red).
+* **Layer donut:** show opaque % per layer with tooltips linking to file list.
+* **“Why risky?” chips:** highlight triggers such as *Packed-like*, *Stripped*, *No symbols*.
+* Policy banners explain configured thresholds and mitigation (add provenance, unpack, or accept risk).
+* Provide direct download links to `entropy.report.json` for audits.
+
+## 6. CLI / API hooks
+
+* CLI – `stella scan artifacts --entropy` option prints top opaque files and penalties.
+* API – `GET /api/v1/scans/{id}/entropy` serves summary + evidence references.
+* Notify templates can include entropy penalties to escalate opaque images.
+
+## 7. Trust algebra
+
+The penalty is computed as:
+
+\[
+\text{entropyPenalty} = K \sum_{\text{layers}} \left( \frac{\text{opaqueBytes}}{\text{totalBytes}} \times \frac{\text{layerBytes}}{\text{imageBytes}} \right)
+\]
+
+* Default `K = 0.5`.
+* Cap penalty at 0.3 to avoid over-weighting tiny blobs.
+* Combine with other trust signals (reachability, provenance) to prioritise audits.
+
+## 8. Implementation checklist
+
+| Area | Task ID | Notes |
+|------|---------|-------|
+| Scanner analysis | `SCAN-ENTROPY-186-011` | Sliding window entropy & heuristics |
+| Evidence output | `SCAN-ENTROPY-186-012` | JSON reports + DSSE |
+| Policy integration | `POLICY-RISK-90-001` | Trust weight + explanations |
+| UI | `UI-ENTROPY-40-001/002` | Visualisation & messaging |
+| Docs | `DOCS-ENTROPY-70-004` | (this guide) |
+
+Update this document as thresholds change or additional packer signatures are introduced.