feat: Add Promotion-Time Attestations for Stella Ops

- Introduced a new document for promotion-time attestations, detailing the purpose, predicate schema, producer workflow, verification flow, APIs, and security considerations.
- Implemented the `stella.ops/promotion@v1` predicate schema to capture promotion evidence including image digest, SBOM/VEX artifacts, and Rekor proof.
- Defined producer responsibilities and workflows for CLI orchestration, signer responsibilities, and Export Center integration.
- Added verification steps for auditors to validate promotion attestations offline.

feat: Create Symbol Manifest v1 Specification

- Developed a specification for Symbol Manifest v1 to provide a deterministic format for publishing debug symbols and source maps.
- Defined the manifest structure, including schema, entries, source maps, toolchain, and provenance.
- Outlined upload and verification processes, resolve APIs, runtime proxy, caching, and offline bundle generation.
- Included security considerations and related tasks for implementation.

chore: Add Ruby Analyzer with Git Sources

- Created a Gemfile and Gemfile.lock for Ruby analyzer with dependencies on git-gem, httparty, and path-gem.
- Implemented main application logic to utilize the defined gems and output their versions.
- Added expected JSON output for the Ruby analyzer to validate the integration of the new gems and their functionalities.
- Developed internal observation classes for Ruby packages, runtime edges, and capabilities, including serialization logic for observations.

test: Add tests for Ruby Analyzer

- Created test fixtures for Ruby analyzer, including Gemfile, Gemfile.lock, main application, and expected JSON output.
- Ensured that the tests validate the correct integration and functionality of the Ruby analyzer with the specified gems.
This commit is contained in:
master
2025-11-11 15:30:22 +02:00
parent 56c687253f
commit c2c6b58b41
56 changed files with 2305 additions and 198 deletions

View File

@@ -1,6 +1,6 @@
# Ruby Analyzer Parity Design (SCANNER-ENG-0009)
**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
**Status:** Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10
## 1. Goals & Non-Goals
- **Goals**
@@ -70,10 +70,9 @@
### 4.4 Runtime Graph Builder
- Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
- Implementation phases:
1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
1. **MVP (shipped in Sprint 138):** perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit `require*` and `autoload` statements, records referencing files, and links back to packages when a matching lock entry exists.
2. **Planned follow-up:** integrate tree-sitter Ruby under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` for full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.
### 4.5 Capability & Surface Signals
- Emit evidence documents for:
@@ -95,11 +94,13 @@
| `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
| `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
| `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
| `ruby_observation.json` | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |
All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.
## 6. Testing Strategy
- **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
- **Fixtures**: Added `git-sources` scenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling.
- **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
- **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
- **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
@@ -121,15 +122,15 @@ All records follow AOC appender rules (immutable, tenant-scoped) and include `ha
- Need alignment with Export Center on Ruby-specific manifest emissions.
## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10).
- **Obligations**:
1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
1. Keep MIT license texts in `/third-party-licenses/` and ship them with Offline Kits (fulfilled via `build_offline_kit.py` copying the directory into staging).
2. Track acknowledgements in `NOTICE.md` (completed).
3. Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does **not** bundle tree-sitter artifacts yet, so no generated sources are redistributed.
4. When tree-sitter integration lands, ensure `tree-sitter-cli` remains a build-time tool only.
- **Deliverables**:
- SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
- Export Center to mirror license files into Offline Kit bundle.
- SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors `third-party-licenses/`.
- Export centre recipe inherits the copied directory with deterministic hashing.
---
*References:*

View File

@@ -0,0 +1,87 @@
# Scanner Determinism Score Guide
> **Status:** Draft Sprint 186/202/203
> **Owners:** Scanner Guild · QA Guild · DevEx/CLI Guild · DevOps Guild
## 1. Goal
Quantify how repeatable a scanner release is by re-running scans under frozen conditions and reporting the ratio of bit-for-bit identical outputs. The determinism score lets customers and auditors confirm that StellaOps scans are replayable and trustworthy.
## 2. Test harness overview (`SCAN-DETER-186-009`)
1. **Inputs:** image digests, policy bundle SHA, feed snapshot SHA, scanner container digest, platform (linux/amd64 by default).
2. **Execution loop:** run the scanner *N* times (default 10) with:
* `--fixed-clock <timestamp>`
* `RNG_SEED=1337`
* `SCANNER_MAX_CONCURRENCY=1`
* feeds/policy tarballs mounted read-only
* `--network=none`, `--cpuset-cpus=0`, `--memory=2G`
3. **Canonicalisation:** normalise JSON outputs (SBOM, VEX, findings, logs) using the same serializer as production (`StellaOps.Scanner.Replay` helpers).
4. **Hashing:** compute SHA-256 for each canonical artefact per run.
5. **Score calculation:** `identical_runs / total_runs` (per image and overall). A run is “identical” if all artefact hashes match the baseline (run 1).
The harness persists the full run set under CAS, allowing regression tests and Offline kit inclusion.
## 3. Output artefacts (`SCAN-DETER-186-010`)
* `determinism.json` per-image runs, identical counts, score, policy/feed hashes.
* `run_i/*.json` canonicalised outputs for debugging.
* `diffs/` optional diff samples when runs diverge.
Example `determinism.json`:
```json
{
"release": "scanner-0.14.3",
"platform": "linux/amd64",
"policy_sha": "a1b2c3…",
"feeds_sha": "d4e5f6…",
"images": [
{
"digest": "sha256:abc…",
"runs": 10,
"identical": 10,
"score": 1.0,
"artifact_hashes": {
"sbom.cdx.json": "sha256:11…",
"vex.json": "sha256:22…",
"findings.json": "sha256:33…"
}
}
],
"overall_score": 1.0
}
```
## 4. CI integration (`DEVOPS-SCAN-90-004`)
* GitHub/Gitea pipeline stages run the determinism harness for the release matrix.
* Fail the job when `overall_score < threshold` (default 0.95) or any image falls below 0.90.
* Upload `determinism.json` and artefacts as build outputs; attach to release notes and Offline kits.
## 5. CLI support (`CLI-DETER-70-003/004`)
* `stella detscore run` executes the harness locally, honoring the same frozen-clock and seed settings; exits non-zero when score falls below the configured threshold.
* `stella detscore report` summarises one or more `determinism.json` files for release notes, showing per-image scores and detection of non-deterministic artefacts.
## 6. Policy & UI consumption
* Policy Engine can enforce determinism thresholds (e.g., block promotion if score < 0.95) using the `determinism.json` evidence.
* UI surfaces the score alongside scans (e.g., badge in scan detail view) referencing task `UI-SBOM-DET-01`.
## 7. Evidence & replay
* Include `determinism.json` and canonical run outputs in Replay bundles (`docs/replay/DETERMINISTIC_REPLAY.md`).
* DSSE-sign determinism results before adding them to Evidence Locker.
## 8. Implementation checklist
| Area | Task ID | Notes |
|------|---------|-------|
| Harness | `SCAN-DETER-186-009` | Deterministic execution + hashing |
| Artefacts | `SCAN-DETER-186-010` | Publish JSON, CAS storage |
| CLI | `CLI-DETER-70-003/004` | Local runs + reporting |
| DevOps | `DEVOPS-SCAN-90-004` | CI enforcement |
| Docs | `DOCS-DETER-70-002` | (this document) |
Update this guide with links to code once tasks move to **DONE**.

View File

@@ -0,0 +1,126 @@
# Entropy Analysis for Executable Layers
> **Status:** Draft Sprint 186/209
> **Owners:** Scanner Guild · Policy Guild · UI Guild · Docs Guild
## 1. Overview
Entropy analysis highlights opaque regions inside container layers (packed binaries, stripped blobs, embedded firmware) so StellaOps can prioritise artefacts that are hard to audit. The scanner computes per-file entropy metrics, reports opaque ratios per layer, and feeds penalties into the trust algebra.
## 2. Scanner pipeline (`SCAN-ENTROPY-186-011/012`)
* **Target files:** ELF, PE/COFF, Mach-O executables and large raw blobs (>16KB). Archive formats (zip/tar) are unpacked by existing analyzers before entropy processing.
* **Section analysis:**
* ELF `.text`, `.rodata`, `.data`, custom sections.
* PE section table entries (`IMAGE_SECTION_HEADER`).
* Mach-O LC_SEGMENT/LC_SEGMENT_64 sections.
* **Sliding window:** 4KB window with 1KB stride. Entropy calculated using Shannon entropy:
\[
H = -\sum_{i=0}^{255} p_i \log_2 p_i
\]
Windows with `H ≥ 7.2` bits/byte are marked “opaque”.
* **Heuristics & hints:**
* Flag entire files with no symbols or stripped debug info.
* Detect known packer section names (`.UPX*`, `.aspack`, etc.).
* Record offsets, window sizes, and entropy values to support explainability.
* **Outputs:**
* `entropy.report.json` (per-file details, windows, hints).
* `layer_summary.json` (opaque byte ratios per layer and overall image).
* Penalty score contributed to the trust algebra (`entropy_penalty`).
All JSON output is canonical (sorted keys, UTF-8) and included in DSSE attestations/replay bundles.
## 3. JSON Schemas
### 3.1 `entropy.report.json`
```jsonc
{
"schema": "stellaops.entropy/report@1",
"imageDigest": "sha256:…",
"layerDigest": "sha256:…",
"files": [
{
"path": "/opt/app/libblob.so",
"size": 5242880,
"opaqueBytes": 1342177,
"opaqueRatio": 0.25,
"flags": ["stripped", "section:.UPX0"],
"windows": [
{ "offset": 0, "length": 4096, "entropy": 7.45 },
{ "offset": 1024, "length": 4096, "entropy": 7.38 }
]
}
]
}
```
### 3.2 `layer_summary.json`
```jsonc
{
"schema": "stellaops.entropy/layer-summary@1",
"imageDigest": "sha256:…",
"layers": [
{
"digest": "sha256:layer4…",
"opaqueBytes": 2306867,
"totalBytes": 10485760,
"opaqueRatio": 0.22,
"indicators": ["packed", "no-symbols"]
}
],
"imageOpaqueRatio": 0.18,
"entropyPenalty": 0.12
}
```
## 4. Policy integration (`POLICY-RISK-90-001`)
* Policy Engine receives `entropy_penalty` and per-layer ratios via scan evidence.
* Default thresholds:
* Block when `imageOpaqueRatio > 0.15` and provenance unknown.
* Warn when any executable has `opaqueRatio > 0.30`.
* Penalty weights are configurable per tenant. Policy explanations include:
* Highest-entropy files and offsets.
* Reason code (packed, no symbols, runtime reachable).
## 5. UI experience (`UI-ENTROPY-40-001/002`)
* **Heatmaps:** render entropy along the file timeline (green → red).
* **Layer donut:** show opaque % per layer with tooltips linking to file list.
* **“Why risky?” chips:** highlight triggers such as *Packed-like*, *Stripped*, *No symbols*.
* Policy banners explain configured thresholds and mitigation (add provenance, unpack, or accept risk).
* Provide direct download links to `entropy.report.json` for audits.
## 6. CLI / API hooks
* CLI `stella scan artifacts --entropy` option prints top opaque files and penalties.
* API `GET /api/v1/scans/{id}/entropy` serves summary + evidence references.
* Notify templates can include entropy penalties to escalate opaque images.
## 7. Trust algebra
The penalty is computed as:
\[
\text{entropyPenalty} = K \sum_{\text{layers}} \left( \frac{\text{opaqueBytes}}{\text{totalBytes}} \times \frac{\text{layerBytes}}{\text{imageBytes}} \right)
\]
* Default `K = 0.5`.
* Cap penalty at 0.3 to avoid over-weighting tiny blobs.
* Combine with other trust signals (reachability, provenance) to prioritise audits.
## 8. Implementation checklist
| Area | Task ID | Notes |
|------|---------|-------|
| Scanner analysis | `SCAN-ENTROPY-186-011` | Sliding window entropy & heuristics |
| Evidence output | `SCAN-ENTROPY-186-012` | JSON reports + DSSE |
| Policy integration | `POLICY-RISK-90-001` | Trust weight + explanations |
| UI | `UI-ENTROPY-40-001/002` | Visualisation & messaging |
| Docs | `DOCS-ENTROPY-70-004` | (this guide) |
Update this document as thresholds change or additional packer signatures are introduced.