feat: Add Promotion-Time Attestations for Stella Ops

- Introduced a new document for promotion-time attestations, detailing the purpose, predicate schema, producer workflow, verification flow, APIs, and security considerations.
- Implemented the `stella.ops/promotion@v1` predicate schema to capture promotion evidence including image digest, SBOM/VEX artifacts, and Rekor proof.
- Defined producer responsibilities and workflows for CLI orchestration, signer responsibilities, and Export Center integration.
- Added verification steps for auditors to validate promotion attestations offline.

feat: Create Symbol Manifest v1 Specification

- Developed a specification for Symbol Manifest v1 to provide a deterministic format for publishing debug symbols and source maps.
- Defined the manifest structure, including schema, entries, source maps, toolchain, and provenance.
- Outlined upload and verification processes, resolve APIs, runtime proxy, caching, and offline bundle generation.
- Included security considerations and related tasks for implementation.

chore: Add Ruby Analyzer with Git Sources

- Created a Gemfile and Gemfile.lock for Ruby analyzer with dependencies on git-gem, httparty, and path-gem.
- Implemented main application logic to utilize the defined gems and output their versions.
- Added expected JSON output for the Ruby analyzer to validate the integration of the new gems and their functionalities.
- Developed internal observation classes for Ruby packages, runtime edges, and capabilities, including serialization logic for observations.

test: Add tests for Ruby Analyzer

- Created test fixtures for Ruby analyzer, including Gemfile, Gemfile.lock, main application, and expected JSON output.
- Ensured that the tests validate the correct integration and functionality of the Ruby analyzer with the specified gems.
This commit is contained in:
master
2025-11-11 15:30:22 +02:00
parent 56c687253f
commit c2c6b58b41
56 changed files with 2305 additions and 198 deletions

View File

@@ -23,6 +23,7 @@ Policy Engine compiles and evaluates Stella DSL policies deterministically, prod
- Governance and scope mapping in ../../security/policy-governance.md.
- Readiness briefs: ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
- Readiness briefs: ../scanner/design/macos-analyzer.md, ../scanner/design/windows-analyzer.md, ../policy/secret-leak-detection-readiness.md, ../policy/windows-package-readiness.md.
- Ruby capability predicates design: ./design/ruby-capability-predicates.md.
## Backlog references
- DOCS-POLICY-20-001 … DOCS-POLICY-20-012 (completed baseline).

View File

@@ -0,0 +1,5 @@
# Policy Engine Guild — Active Tasks
| Task ID | State | Notes |
| --- | --- | --- |
| `SCANNER-POLICY-0001` | DONE (2025-11-10) | Ruby component predicates implemented in engine/tests, DSL docs updated, offline kit verifies `seed-data/analyzers/ruby/git-sources`. |

View File

@@ -0,0 +1,82 @@
# Ruby Capability & Source Predicates (SCANNER-POLICY-0001)
**Status:** Implemented · Owner: Policy Guild · Updated: 2025-11-10
**Scope:** Extend Policy Engine DSL to consume Ruby analyzer metadata (`groups`, `declaredOnly`, capabilities, git/path provenance) emitted in Sprint 138.
---
## 1. Goals
1. Allow policies to express intent around Bundler groups (e.g., blocking `development` gems in production promotes).
2. Expose Ruby capability evidence (exec/net/serialization/job schedulers) as first-class predicates.
3. Differentiate package provenance: registry, git, path/vendor cache.
4. Ensure new predicates work in offline/air-gapped evaluation and export deterministically.
Non-goals: UI wiring (handled by Policy Studio team), policy templates rollout (tracked separately in DOCS-POLICY backlog).
## 2. Source Metadata
Scanner now emits the following fields per Ruby component:
| Field | Type | Example | Notes |
|-------|------|---------|-------|
| `groups` | `string` (semi-colon list) | `development;test` | Aggregated from manifest + lockfile. |
| `declaredOnly` | `bool` (string `"true"/"false"`) | `"false"` | False indicates vendor cache evidence present. |
| `source` | `string` | `git:https://github.com/example/git-gem.git@<rev>` | Registry (`https://`), `git:`, `path:`, `vendor-cache`. |
| `artifact` | `string?` | `vendor/cache/path-gem-2.1.3.gem` | Only when cached artefact observed. |
| Capability flags | `string -> bool` | `capability.exec = "true"` etc. | Includes scheduler sub-keys. |
## 3. Proposed Predicates
| Predicate | Signature | Description |
|-----------|-----------|-------------|
| `ruby.group(name: string)` | `bool` | True if component belongs to Bundler group `name`. |
| `ruby.groups()` | `set<string>` | Returns all groups for aggregations. |
| `ruby.declared_only()` | `bool` | True when component has no vendor/installed evidence. |
| `ruby.source(kind?: string)` | `bool` | Kind matches prefix (`registry`, `git`, `path`, `vendor-cache`). |
| `ruby.capability(name: string)` | `bool` | Supported names: `exec`, `net`, `serialization`, `scheduler`, scheduler subtypes (`scheduler.activejob`, etc.). |
| `ruby.capability_any(names: set<string>)` | `bool` | Utility predicate to check multiple capabilities. |
Implementation detail: compile-time validation ensures predicate usage only within Ruby component scope (similar to `node.group` pattern).
## 4. DSL & Engine Changes
1. **Schema mapping:** Update `ComponentFacts` model to surface new Ruby metadata in evaluation context.
2. **Predicate registry:** Add Ruby-specific predicate handlers to `PolicyPredicateRegistry` with deterministic ordering.
3. **Explain traces:** Include matched predicates + metadata in explain output.
4. **Exports:** Ensure Offline Kit bundles include updated predicate metadata (no runtime fetch).
## 5. Policy Templates (follow-up)
Create sample rules under `policy/templates/ruby`:
- Block `ruby.group("development")` when `promotion.target == "prod"`.
- Flag `ruby.capability("exec")` components unless allowlisted.
- Require `ruby.source("git")` packages to provide pinned hash allowlists.
Tracking: DOCS-POLICY follow-up (not part of SCANNER-POLICY-0001 initial kick-off).
## 6. Testing Strategy
- Unit tests for each predicate (true/false cases, unsupported values).
- Integration test tying sample Scanner payload to simulated policy evaluation.
- Determinism run: repeated evaluation with same snapshot must yield identical explain trace hash.
- Offline regression: ensure `seed-data/analyzers/ruby/git-sources` fixture flows through offline-kit policy evaluation script.
## 7. Timeline & Dependencies
| Step | Owner | Target |
|------|-------|--------|
| Predicate implementation + tests | Policy Engine Guild | Sprint 138 (in progress) |
| Offline kit regression update | Policy + Ops | Sprint 138 |
| Policy templates & docs | Docs Guild | Sprint 139 |
Dependencies: Scanner metadata in place (SCANNER-ENG-0016 DONE); no additional service contracts required.
## 8. Open Questions
1. Should `declaredOnly` interact with existing waiver semantics (e.g., treat as lower severity)? → Needs risk review.
2. Do we expose scheduler sub-types individually or aggregate under `ruby.capability("scheduler")` only? → Proposed to expose both for flexibility.
3. Is git URL normalization required (strip credentials, hash fragments)? → Ensure sanitization before evaluation.
Please comment in `docs/modules/policy/design/ruby-capability-predicates.md` or via SCANNER-POLICY-0001 sprint entry.

View File

@@ -1,6 +1,6 @@
# Ruby Analyzer Parity Design (SCANNER-ENG-0009)
**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
**Status:** Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10
## 1. Goals & Non-Goals
- **Goals**
@@ -70,10 +70,9 @@
### 4.4 Runtime Graph Builder
- Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
- Implementation phases:
1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
1. **MVP (shipped in Sprint 138):** perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit `require*` and `autoload` statements, records referencing files, and links back to packages when a matching lock entry exists.
2. **Planned follow-up:** integrate tree-sitter Ruby under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` for full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.
### 4.5 Capability & Surface Signals
- Emit evidence documents for:
@@ -95,11 +94,13 @@
| `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
| `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
| `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
| `ruby_observation.json` | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |
All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.
## 6. Testing Strategy
- **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
- **Fixtures**: Added `git-sources` scenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling.
- **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
- **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
- **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
@@ -121,15 +122,15 @@ All records follow AOC appender rules (immutable, tenant-scoped) and include `ha
- Need alignment with Export Center on Ruby-specific manifest emissions.
## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10).
- **Obligations**:
1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
1. Keep MIT license texts in `/third-party-licenses/` and ship them with Offline Kits (fulfilled via `build_offline_kit.py` copying the directory into staging).
2. Track acknowledgements in `NOTICE.md` (completed).
3. Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does **not** bundle tree-sitter artifacts yet, so no generated sources are redistributed.
4. When tree-sitter integration lands, ensure `tree-sitter-cli` remains a build-time tool only.
- **Deliverables**:
- SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
- Export Center to mirror license files into Offline Kit bundle.
- SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors `third-party-licenses/`.
- Export centre recipe inherits the copied directory with deterministic hashing.
---
*References:*

View File

@@ -0,0 +1,87 @@
# Scanner Determinism Score Guide
> **Status:** Draft Sprint 186/202/203
> **Owners:** Scanner Guild · QA Guild · DevEx/CLI Guild · DevOps Guild
## 1. Goal
Quantify how repeatable a scanner release is by re-running scans under frozen conditions and reporting the ratio of bit-for-bit identical outputs. The determinism score lets customers and auditors confirm that StellaOps scans are replayable and trustworthy.
## 2. Test harness overview (`SCAN-DETER-186-009`)
1. **Inputs:** image digests, policy bundle SHA, feed snapshot SHA, scanner container digest, platform (linux/amd64 by default).
2. **Execution loop:** run the scanner *N* times (default 10) with:
* `--fixed-clock <timestamp>`
* `RNG_SEED=1337`
* `SCANNER_MAX_CONCURRENCY=1`
* feeds/policy tarballs mounted read-only
* `--network=none`, `--cpuset-cpus=0`, `--memory=2G`
3. **Canonicalisation:** normalise JSON outputs (SBOM, VEX, findings, logs) using the same serializer as production (`StellaOps.Scanner.Replay` helpers).
4. **Hashing:** compute SHA-256 for each canonical artefact per run.
5. **Score calculation:** `identical_runs / total_runs` (per image and overall). A run is “identical” if all artefact hashes match the baseline (run 1).
The harness persists the full run set under CAS, allowing regression tests and Offline kit inclusion.
## 3. Output artefacts (`SCAN-DETER-186-010`)
* `determinism.json` per-image runs, identical counts, score, policy/feed hashes.
* `run_i/*.json` canonicalised outputs for debugging.
* `diffs/` optional diff samples when runs diverge.
Example `determinism.json`:
```json
{
"release": "scanner-0.14.3",
"platform": "linux/amd64",
"policy_sha": "a1b2c3…",
"feeds_sha": "d4e5f6…",
"images": [
{
"digest": "sha256:abc…",
"runs": 10,
"identical": 10,
"score": 1.0,
"artifact_hashes": {
"sbom.cdx.json": "sha256:11…",
"vex.json": "sha256:22…",
"findings.json": "sha256:33…"
}
}
],
"overall_score": 1.0
}
```
## 4. CI integration (`DEVOPS-SCAN-90-004`)
* GitHub/Gitea pipeline stages run the determinism harness for the release matrix.
* Fail the job when `overall_score < threshold` (default 0.95) or any image falls below 0.90.
* Upload `determinism.json` and artefacts as build outputs; attach to release notes and Offline kits.
## 5. CLI support (`CLI-DETER-70-003/004`)
* `stella detscore run` executes the harness locally, honoring the same frozen-clock and seed settings; exits non-zero when score falls below the configured threshold.
* `stella detscore report` summarises one or more `determinism.json` files for release notes, showing per-image scores and detection of non-deterministic artefacts.
## 6. Policy & UI consumption
* Policy Engine can enforce determinism thresholds (e.g., block promotion if score < 0.95) using the `determinism.json` evidence.
* UI surfaces the score alongside scans (e.g., badge in scan detail view) referencing task `UI-SBOM-DET-01`.
## 7. Evidence & replay
* Include `determinism.json` and canonical run outputs in Replay bundles (`docs/replay/DETERMINISTIC_REPLAY.md`).
* DSSE-sign determinism results before adding them to Evidence Locker.
## 8. Implementation checklist
| Area | Task ID | Notes |
|------|---------|-------|
| Harness | `SCAN-DETER-186-009` | Deterministic execution + hashing |
| Artefacts | `SCAN-DETER-186-010` | Publish JSON, CAS storage |
| CLI | `CLI-DETER-70-003/004` | Local runs + reporting |
| DevOps | `DEVOPS-SCAN-90-004` | CI enforcement |
| Docs | `DOCS-DETER-70-002` | (this document) |
Update this guide with links to code once tasks move to **DONE**.

View File

@@ -0,0 +1,126 @@
# Entropy Analysis for Executable Layers
> **Status:** Draft Sprint 186/209
> **Owners:** Scanner Guild · Policy Guild · UI Guild · Docs Guild
## 1. Overview
Entropy analysis highlights opaque regions inside container layers (packed binaries, stripped blobs, embedded firmware) so StellaOps can prioritise artefacts that are hard to audit. The scanner computes per-file entropy metrics, reports opaque ratios per layer, and feeds penalties into the trust algebra.
## 2. Scanner pipeline (`SCAN-ENTROPY-186-011/012`)
* **Target files:** ELF, PE/COFF, Mach-O executables and large raw blobs (>16KB). Archive formats (zip/tar) are unpacked by existing analyzers before entropy processing.
* **Section analysis:**
* ELF `.text`, `.rodata`, `.data`, custom sections.
* PE section table entries (`IMAGE_SECTION_HEADER`).
* Mach-O LC_SEGMENT/LC_SEGMENT_64 sections.
* **Sliding window:** 4KB window with 1KB stride. Entropy calculated using Shannon entropy:
\[
H = -\sum_{i=0}^{255} p_i \log_2 p_i
\]
Windows with `H ≥ 7.2` bits/byte are marked “opaque”.
* **Heuristics & hints:**
* Flag entire files with no symbols or stripped debug info.
* Detect known packer section names (`.UPX*`, `.aspack`, etc.).
* Record offsets, window sizes, and entropy values to support explainability.
* **Outputs:**
* `entropy.report.json` (per-file details, windows, hints).
* `layer_summary.json` (opaque byte ratios per layer and overall image).
* Penalty score contributed to the trust algebra (`entropy_penalty`).
All JSON output is canonical (sorted keys, UTF-8) and included in DSSE attestations/replay bundles.
## 3. JSON Schemas
### 3.1 `entropy.report.json`
```jsonc
{
"schema": "stellaops.entropy/report@1",
"imageDigest": "sha256:…",
"layerDigest": "sha256:…",
"files": [
{
"path": "/opt/app/libblob.so",
"size": 5242880,
"opaqueBytes": 1342177,
"opaqueRatio": 0.25,
"flags": ["stripped", "section:.UPX0"],
"windows": [
{ "offset": 0, "length": 4096, "entropy": 7.45 },
{ "offset": 1024, "length": 4096, "entropy": 7.38 }
]
}
]
}
```
### 3.2 `layer_summary.json`
```jsonc
{
"schema": "stellaops.entropy/layer-summary@1",
"imageDigest": "sha256:…",
"layers": [
{
"digest": "sha256:layer4…",
"opaqueBytes": 2306867,
"totalBytes": 10485760,
"opaqueRatio": 0.22,
"indicators": ["packed", "no-symbols"]
}
],
"imageOpaqueRatio": 0.18,
"entropyPenalty": 0.12
}
```
## 4. Policy integration (`POLICY-RISK-90-001`)
* Policy Engine receives `entropy_penalty` and per-layer ratios via scan evidence.
* Default thresholds:
* Block when `imageOpaqueRatio > 0.15` and provenance unknown.
* Warn when any executable has `opaqueRatio > 0.30`.
* Penalty weights are configurable per tenant. Policy explanations include:
* Highest-entropy files and offsets.
* Reason code (packed, no symbols, runtime reachable).
## 5. UI experience (`UI-ENTROPY-40-001/002`)
* **Heatmaps:** render entropy along the file timeline (green → red).
* **Layer donut:** show opaque % per layer with tooltips linking to file list.
* **“Why risky?” chips:** highlight triggers such as *Packed-like*, *Stripped*, *No symbols*.
* Policy banners explain configured thresholds and mitigation (add provenance, unpack, or accept risk).
* Provide direct download links to `entropy.report.json` for audits.
## 6. CLI / API hooks
* CLI `stella scan artifacts --entropy` option prints top opaque files and penalties.
* API `GET /api/v1/scans/{id}/entropy` serves summary + evidence references.
* Notify templates can include entropy penalties to escalate opaque images.
## 7. Trust algebra
The penalty is computed as:
\[
\text{entropyPenalty} = K \sum_{\text{layers}} \left( \frac{\text{opaqueBytes}}{\text{totalBytes}} \times \frac{\text{layerBytes}}{\text{imageBytes}} \right)
\]
* Default `K = 0.5`.
* Cap penalty at 0.3 to avoid over-weighting tiny blobs.
* Combine with other trust signals (reachability, provenance) to prioritise audits.
## 8. Implementation checklist
| Area | Task ID | Notes |
|------|---------|-------|
| Scanner analysis | `SCAN-ENTROPY-186-011` | Sliding window entropy & heuristics |
| Evidence output | `SCAN-ENTROPY-186-012` | JSON reports + DSSE |
| Policy integration | `POLICY-RISK-90-001` | Trust weight + explanations |
| UI | `UI-ENTROPY-40-001/002` | Visualisation & messaging |
| Docs | `DOCS-ENTROPY-70-004` | (this guide) |
Update this document as thresholds change or additional packer signatures are introduced.