feat: Add Promotion-Time Attestations for Stella Ops
- Introduced a new document for promotion-time attestations, detailing the purpose, predicate schema, producer workflow, verification flow, APIs, and security considerations. - Implemented the `stella.ops/promotion@v1` predicate schema to capture promotion evidence including image digest, SBOM/VEX artifacts, and Rekor proof. - Defined producer responsibilities and workflows for CLI orchestration, signer responsibilities, and Export Center integration. - Added verification steps for auditors to validate promotion attestations offline. feat: Create Symbol Manifest v1 Specification - Developed a specification for Symbol Manifest v1 to provide a deterministic format for publishing debug symbols and source maps. - Defined the manifest structure, including schema, entries, source maps, toolchain, and provenance. - Outlined upload and verification processes, resolve APIs, runtime proxy, caching, and offline bundle generation. - Included security considerations and related tasks for implementation. chore: Add Ruby Analyzer with Git Sources - Created a Gemfile and Gemfile.lock for Ruby analyzer with dependencies on git-gem, httparty, and path-gem. - Implemented main application logic to utilize the defined gems and output their versions. - Added expected JSON output for the Ruby analyzer to validate the integration of the new gems and their functionalities. - Developed internal observation classes for Ruby packages, runtime edges, and capabilities, including serialization logic for observations. test: Add tests for Ruby Analyzer - Created test fixtures for Ruby analyzer, including Gemfile, Gemfile.lock, main application, and expected JSON output. - Ensured that the tests validate the correct integration and functionality of the Ruby analyzer with the specified gems.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Ruby Analyzer Parity Design (SCANNER-ENG-0009)
|
||||
|
||||
**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
|
||||
**Status:** Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10
|
||||
|
||||
## 1. Goals & Non-Goals
|
||||
- **Goals**
|
||||
@@ -70,10 +70,9 @@
|
||||
### 4.4 Runtime Graph Builder
|
||||
- Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
|
||||
- Implementation phases:
|
||||
1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
|
||||
2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
|
||||
3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
|
||||
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
|
||||
1. **MVP (shipped in Sprint 138):** perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit `require*` and `autoload` statements, records referencing files, and links back to packages when a matching lock entry exists.
|
||||
2. **Planned follow-up:** integrate tree-sitter Ruby under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` for full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
|
||||
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.
|
||||
|
||||
### 4.5 Capability & Surface Signals
|
||||
- Emit evidence documents for:
|
||||
@@ -95,11 +94,13 @@
|
||||
| `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
|
||||
| `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
|
||||
| `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
|
||||
| `ruby_observation.json` | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |
|
||||
|
||||
All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.
|
||||
|
||||
## 6. Testing Strategy
|
||||
- **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
|
||||
- **Fixtures**: Added `git-sources` scenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling.
|
||||
- **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
|
||||
- **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
|
||||
- **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
|
||||
@@ -121,15 +122,15 @@ All records follow AOC appender rules (immutable, tenant-scoped) and include `ha
|
||||
- Need alignment with Export Center on Ruby-specific manifest emissions.
|
||||
|
||||
## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
|
||||
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
|
||||
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10).
|
||||
- **Obligations**:
|
||||
1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
|
||||
2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
|
||||
3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
|
||||
4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
|
||||
1. Keep MIT license texts in `/third-party-licenses/` and ship them with Offline Kits (fulfilled via `build_offline_kit.py` copying the directory into staging).
|
||||
2. Track acknowledgements in `NOTICE.md` (completed).
|
||||
3. Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does **not** bundle tree-sitter artifacts yet, so no generated sources are redistributed.
|
||||
4. When tree-sitter integration lands, ensure `tree-sitter-cli` remains a build-time tool only.
|
||||
- **Deliverables**:
|
||||
- SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
|
||||
- Export Center to mirror license files into Offline Kit bundle.
|
||||
- SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors `third-party-licenses/`.
|
||||
- Export centre recipe inherits the copied directory with deterministic hashing.
|
||||
|
||||
---
|
||||
*References:*
|
||||
|
||||
87
docs/modules/scanner/determinism-score.md
Normal file
87
docs/modules/scanner/determinism-score.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Scanner Determinism Score Guide
|
||||
|
||||
> **Status:** Draft – Sprint 186/202/203
|
||||
> **Owners:** Scanner Guild · QA Guild · DevEx/CLI Guild · DevOps Guild
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Quantify how repeatable a scanner release is by re-running scans under frozen conditions and reporting the ratio of bit-for-bit identical outputs. The determinism score lets customers and auditors confirm that Stella Ops scans are replayable and trustworthy.
|
||||
|
||||
## 2. Test harness overview (`SCAN-DETER-186-009`)
|
||||
|
||||
1. **Inputs:** image digests, policy bundle SHA, feed snapshot SHA, scanner container digest, platform (linux/amd64 by default).
|
||||
2. **Execution loop:** run the scanner *N* times (default 10) with:
|
||||
* `--fixed-clock <timestamp>`
|
||||
* `RNG_SEED=1337`
|
||||
* `SCANNER_MAX_CONCURRENCY=1`
|
||||
* feeds/policy tarballs mounted read-only
|
||||
* `--network=none`, `--cpuset-cpus=0`, `--memory=2G`
|
||||
3. **Canonicalisation:** normalise JSON outputs (SBOM, VEX, findings, logs) using the same serializer as production (`StellaOps.Scanner.Replay` helpers).
|
||||
4. **Hashing:** compute SHA-256 for each canonical artefact per run.
|
||||
5. **Score calculation:** `identical_runs / total_runs` (per image and overall). A run is “identical” if all artefact hashes match the baseline (run 1).
|
||||
|
||||
The harness persists the full run set under CAS, allowing regression tests and Offline kit inclusion.
|
||||
|
||||
## 3. Output artefacts (`SCAN-DETER-186-010`)
|
||||
|
||||
* `determinism.json` – per-image runs, identical counts, score, policy/feed hashes.
|
||||
* `run_i/*.json` – canonicalised outputs for debugging.
|
||||
* `diffs/` – optional diff samples when runs diverge.
|
||||
|
||||
Example `determinism.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"release": "scanner-0.14.3",
|
||||
"platform": "linux/amd64",
|
||||
"policy_sha": "a1b2c3…",
|
||||
"feeds_sha": "d4e5f6…",
|
||||
"images": [
|
||||
{
|
||||
"digest": "sha256:abc…",
|
||||
"runs": 10,
|
||||
"identical": 10,
|
||||
"score": 1.0,
|
||||
"artifact_hashes": {
|
||||
"sbom.cdx.json": "sha256:11…",
|
||||
"vex.json": "sha256:22…",
|
||||
"findings.json": "sha256:33…"
|
||||
}
|
||||
}
|
||||
],
|
||||
"overall_score": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
## 4. CI integration (`DEVOPS-SCAN-90-004`)
|
||||
|
||||
* GitHub/Gitea pipeline stages run the determinism harness for the release matrix.
|
||||
* Fail the job when `overall_score < threshold` (default 0.95) or any image falls below 0.90.
|
||||
* Upload `determinism.json` and artefacts as build outputs; attach to release notes and Offline kits.
|
||||
|
||||
## 5. CLI support (`CLI-DETER-70-003/004`)
|
||||
|
||||
* `stella detscore run` – executes the harness locally, honoring the same frozen-clock and seed settings; exits non-zero when score falls below the configured threshold.
|
||||
* `stella detscore report` – summarises one or more `determinism.json` files for release notes, showing per-image scores and detection of non-deterministic artefacts.
|
||||
|
||||
## 6. Policy & UI consumption
|
||||
|
||||
* Policy Engine can enforce determinism thresholds (e.g., block promotion if score < 0.95) using the `determinism.json` evidence.
|
||||
* UI surfaces the score alongside scans (e.g., badge in scan detail view) referencing task `UI-SBOM-DET-01`.
|
||||
|
||||
## 7. Evidence & replay
|
||||
|
||||
* Include `determinism.json` and canonical run outputs in Replay bundles (`docs/replay/DETERMINISTIC_REPLAY.md`).
|
||||
* DSSE-sign determinism results before adding them to Evidence Locker.
|
||||
|
||||
## 8. Implementation checklist
|
||||
|
||||
| Area | Task ID | Notes |
|
||||
|------|---------|-------|
|
||||
| Harness | `SCAN-DETER-186-009` | Deterministic execution + hashing |
|
||||
| Artefacts | `SCAN-DETER-186-010` | Publish JSON, CAS storage |
|
||||
| CLI | `CLI-DETER-70-003/004` | Local runs + reporting |
|
||||
| DevOps | `DEVOPS-SCAN-90-004` | CI enforcement |
|
||||
| Docs | `DOCS-DETER-70-002` | (this document) |
|
||||
|
||||
Update this guide with links to code once tasks move to **DONE**.
|
||||
126
docs/modules/scanner/entropy.md
Normal file
126
docs/modules/scanner/entropy.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Entropy Analysis for Executable Layers
|
||||
|
||||
> **Status:** Draft – Sprint 186/209
|
||||
> **Owners:** Scanner Guild · Policy Guild · UI Guild · Docs Guild
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Entropy analysis highlights opaque regions inside container layers (packed binaries, stripped blobs, embedded firmware) so Stella Ops can prioritise artefacts that are hard to audit. The scanner computes per-file entropy metrics, reports opaque ratios per layer, and feeds penalties into the trust algebra.
|
||||
|
||||
## 2. Scanner pipeline (`SCAN-ENTROPY-186-011/012`)
|
||||
|
||||
* **Target files:** ELF, PE/COFF, Mach-O executables and large raw blobs (>16 KB). Archive formats (zip/tar) are unpacked by existing analyzers before entropy processing.
|
||||
* **Section analysis:**
|
||||
* ELF – `.text`, `.rodata`, `.data`, custom sections.
|
||||
* PE – section table entries (`IMAGE_SECTION_HEADER`).
|
||||
* Mach-O – LC_SEGMENT/LC_SEGMENT_64 sections.
|
||||
* **Sliding window:** 4 KB window with 1 KB stride. Entropy calculated using Shannon entropy:
|
||||
|
||||
\[
|
||||
H = -\sum_{i=0}^{255} p_i \log_2 p_i
|
||||
\]
|
||||
|
||||
Windows with `H ≥ 7.2` bits/byte are marked “opaque”.
|
||||
* **Heuristics & hints:**
|
||||
* Flag entire files with no symbols or stripped debug info.
|
||||
* Detect known packer section names (`.UPX*`, `.aspack`, etc.).
|
||||
* Record offsets, window sizes, and entropy values to support explainability.
|
||||
* **Outputs:**
|
||||
* `entropy.report.json` (per-file details, windows, hints).
|
||||
* `layer_summary.json` (opaque byte ratios per layer and overall image).
|
||||
* Penalty score contributed to the trust algebra (`entropy_penalty`).
|
||||
|
||||
All JSON output is canonical (sorted keys, UTF-8) and included in DSSE attestations/replay bundles.
|
||||
|
||||
## 3. JSON Schemas
|
||||
|
||||
### 3.1 `entropy.report.json`
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"schema": "stellaops.entropy/report@1",
|
||||
"imageDigest": "sha256:…",
|
||||
"layerDigest": "sha256:…",
|
||||
"files": [
|
||||
{
|
||||
"path": "/opt/app/libblob.so",
|
||||
"size": 5242880,
|
||||
"opaqueBytes": 1342177,
|
||||
"opaqueRatio": 0.25,
|
||||
"flags": ["stripped", "section:.UPX0"],
|
||||
"windows": [
|
||||
{ "offset": 0, "length": 4096, "entropy": 7.45 },
|
||||
{ "offset": 1024, "length": 4096, "entropy": 7.38 }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 `layer_summary.json`
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"schema": "stellaops.entropy/layer-summary@1",
|
||||
"imageDigest": "sha256:…",
|
||||
"layers": [
|
||||
{
|
||||
"digest": "sha256:layer4…",
|
||||
"opaqueBytes": 2306867,
|
||||
"totalBytes": 10485760,
|
||||
"opaqueRatio": 0.22,
|
||||
"indicators": ["packed", "no-symbols"]
|
||||
}
|
||||
],
|
||||
"imageOpaqueRatio": 0.18,
|
||||
"entropyPenalty": 0.12
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Policy integration (`POLICY-RISK-90-001`)
|
||||
|
||||
* Policy Engine receives `entropy_penalty` and per-layer ratios via scan evidence.
|
||||
* Default thresholds:
|
||||
* Block when `imageOpaqueRatio > 0.15` and provenance unknown.
|
||||
* Warn when any executable has `opaqueRatio > 0.30`.
|
||||
* Penalty weights are configurable per tenant. Policy explanations include:
|
||||
* Highest-entropy files and offsets.
|
||||
* Reason code (packed, no symbols, runtime reachable).
|
||||
|
||||
## 5. UI experience (`UI-ENTROPY-40-001/002`)
|
||||
|
||||
* **Heatmaps:** render entropy along the file timeline (green → red).
|
||||
* **Layer donut:** show opaque % per layer with tooltips linking to file list.
|
||||
* **“Why risky?” chips:** highlight triggers such as *Packed-like*, *Stripped*, *No symbols*.
|
||||
* Policy banners explain configured thresholds and mitigation (add provenance, unpack, or accept risk).
|
||||
* Provide direct download links to `entropy.report.json` for audits.
|
||||
|
||||
## 6. CLI / API hooks
|
||||
|
||||
* CLI – `stella scan artifacts --entropy` option prints top opaque files and penalties.
|
||||
* API – `GET /api/v1/scans/{id}/entropy` serves summary + evidence references.
|
||||
* Notify templates can include entropy penalties to escalate opaque images.
|
||||
|
||||
## 7. Trust algebra
|
||||
|
||||
The penalty is computed as:
|
||||
|
||||
\[
|
||||
\text{entropyPenalty} = K \sum_{\text{layers}} \left( \frac{\text{opaqueBytes}}{\text{totalBytes}} \times \frac{\text{layerBytes}}{\text{imageBytes}} \right)
|
||||
\]
|
||||
|
||||
* Default `K = 0.5`.
|
||||
* Cap penalty at 0.3 to avoid over-weighting tiny blobs.
|
||||
* Combine with other trust signals (reachability, provenance) to prioritise audits.
|
||||
|
||||
## 8. Implementation checklist
|
||||
|
||||
| Area | Task ID | Notes |
|
||||
|------|---------|-------|
|
||||
| Scanner analysis | `SCAN-ENTROPY-186-011` | Sliding window entropy & heuristics |
|
||||
| Evidence output | `SCAN-ENTROPY-186-012` | JSON reports + DSSE |
|
||||
| Policy integration | `POLICY-RISK-90-001` | Trust weight + explanations |
|
||||
| UI | `UI-ENTROPY-40-001/002` | Visualisation & messaging |
|
||||
| Docs | `DOCS-ENTROPY-70-004` | (this guide) |
|
||||
|
||||
Update this document as thresholds change or additional packer signatures are introduced.
|
||||
Reference in New Issue
Block a user