# Ruby Capability & Source Predicates (SCANNER-POLICY-0001)

**Status:** Implemented · Owner: Policy Guild · Updated: 2025-11-10  
**Scope:** Extend Policy Engine DSL to consume Ruby analyzer metadata (`groups`, `declaredOnly`, capabilities, git/path provenance) emitted in Sprint 138.

---

## 1. Goals

1. Allow policies to express intent around Bundler groups (e.g., blocking `development` gems in production promotes).
2. Expose Ruby capability evidence (exec/net/serialization/job schedulers) as first-class predicates.
3. Differentiate package provenance: registry, git, path/vendor cache.
4. Ensure new predicates work in offline/air-gapped evaluation and export deterministically.

Non-goals: UI wiring (handled by Policy Studio team), policy templates rollout (tracked separately in DOCS-POLICY backlog).

## 2. Source Metadata

Scanner now emits the following fields per Ruby component:

| Field | Type | Example | Notes |
|-------|------|---------|-------|
| `groups` | `string` (semi-colon list) | `development;test` | Aggregated from manifest + lockfile. |
| `declaredOnly` | `bool` (string `"true"/"false"`) | `"false"` | False indicates vendor cache evidence present. |
| `source` | `string` | `git:https://github.com/example/git-gem.git@<rev>` | Registry (`https://`), `git:`, `path:`, `vendor-cache`. |
| `artifact` | `string?` | `vendor/cache/path-gem-2.1.3.gem` | Only when cached artefact observed. |
| Capability flags | `string -> bool` | `capability.exec = "true"` etc. | Includes scheduler sub-keys. |

## 3. Proposed Predicates

| Predicate | Signature | Description |
|-----------|-----------|-------------|
| `ruby.group(name: string)` | `bool` | True if component belongs to Bundler group `name`. |
| `ruby.groups()` | `set<string>` | Returns all groups for aggregations. |
| `ruby.declared_only()` | `bool` | True when component has no vendor/installed evidence. |
| `ruby.source(kind?: string)` | `bool` | Kind matches prefix (`registry`, `git`, `path`, `vendor-cache`). |
| `ruby.capability(name: string)` | `bool` | Supported names: `exec`, `net`, `serialization`, `scheduler`, scheduler subtypes (`scheduler.activejob`, etc.). |
| `ruby.capability_any(names: set<string>)` | `bool` | Utility predicate to check multiple capabilities. |

Implementation detail: compile-time validation ensures predicate usage only within Ruby component scope (similar to `node.group` pattern).

## 4. DSL & Engine Changes

1. **Schema mapping:** Update `ComponentFacts` model to surface new Ruby metadata in evaluation context.
2. **Predicate registry:** Add Ruby-specific predicate handlers to `PolicyPredicateRegistry` with deterministic ordering.
3. **Explain traces:** Include matched predicates + metadata in explain output.
4. **Exports:** Ensure Offline Kit bundles include updated predicate metadata (no runtime fetch).

## 5. Policy Templates (follow-up)

Create sample rules under `policy/templates/ruby`:

- Block `ruby.group("development")` when `promotion.target == "prod"`.
- Flag `ruby.capability("exec")` components unless allowlisted.
- Require `ruby.source("git")` packages to provide pinned hash allowlists.

Tracking: DOCS-POLICY follow-up (not part of SCANNER-POLICY-0001 initial kick-off).

## 6. Testing Strategy

- Unit tests for each predicate (true/false cases, unsupported values).
- Integration test tying sample Scanner payload to simulated policy evaluation.
- Determinism run: repeated evaluation with same snapshot must yield identical explain trace hash.
- Offline regression: ensure `seed-data/analyzers/ruby/git-sources` fixture flows through offline-kit policy evaluation script.

## 7. Timeline & Dependencies

| Step | Owner | Target |
|------|-------|--------|
| Predicate implementation + tests | Policy Engine Guild | Sprint 138 (in progress) |
| Offline kit regression update | Policy + Ops | Sprint 138 |
| Policy templates & docs | Docs Guild | Sprint 139 |

Dependencies: Scanner metadata in place (SCANNER-ENG-0016 DONE); no additional service contracts required.

## 8. Open Questions

1. Should `declaredOnly` interact with existing waiver semantics (e.g., treat as lower severity)? → Needs risk review.
2. Do we expose scheduler sub-types individually or aggregate under `ruby.capability("scheduler")` only? → Proposed to expose both for flexibility.
3. Is git URL normalization required (strip credentials, hash fragments)? → Ensure sanitization before evaluation.

Please comment in `docs/modules/policy/design/ruby-capability-predicates.md` or via SCANNER-POLICY-0001 sprint entry.