up

2025-11-29 11:08:08 +02:00
parent 7e7be4d2fd
commit 3488b22c0c
102 changed files with 18487 additions and 969 deletions
--- a/docs/product-advisories/29-Nov-2025
+++ b/docs/product-advisories/29-Nov-2025
@@ -0,0 +1,394 @@
+# Policy Simulation and Shadow Gates
+
+**Version:** 1.0
+**Date:** 2025-11-29
+**Status:** Canonical
+
+This advisory defines the product rationale, simulation semantics, and implementation strategy for Policy Engine simulation features, covering shadow runs, coverage fixtures, and promotion gates.
+
+---
+
+## 1. Executive Summary
+
+Policy simulation enables **safe testing of policy changes** before production deployment. Key capabilities:
+
+- **Shadow Runs** - Execute policies without enforcement
+- **Diff Summaries** - Compare old vs new policy outcomes
+- **Coverage Fixtures** - Validate expected findings
+- **Promotion Gates** - Block promotion until tests pass
+- **Deterministic Replay** - Reproduce simulation results
+
+---
+
+## 2. Market Drivers
+
+### 2.1 Target Segments
+
+| Segment | Simulation Requirements | Use Case |
+|---------|------------------------|----------|
+| **Policy Authors** | Preview changes | Development workflow |
+| **Security Leads** | Approve promotions | Change management |
+| **Compliance** | Audit trail | Policy change evidence |
+| **DevSecOps** | CI integration | Automated testing |
+
+### 2.2 Competitive Positioning
+
+Most vulnerability tools lack policy simulation. Stella Ops differentiates with:
+- **Shadow execution** without production impact
+- **Diff visualization** of policy changes
+- **Coverage testing** with fixture validation
+- **Promotion gates** for governance
+- **Deterministic replay** for audit
+
+---
+
+## 3. Simulation Modes
+
+### 3.1 Shadow Run
+
+Execute policy against real data without enforcement:
+
+```bash
+stella policy simulate \
+  --policy my-policy:v2 \
+  --scope "tenant:acme-corp,namespace:production" \
+  --shadow
+```
+
+**Behavior:**
+- Evaluates all findings
+- Records verdicts to shadow collections
+- No enforcement actions
+- No notifications triggered
+- Metrics tagged with `shadow=true`
+
+### 3.2 Diff Run
+
+Compare two policy versions:
+
+```bash
+stella policy diff \
+  --old my-policy:v1 \
+  --new my-policy:v2 \
+  --scope "tenant:acme-corp"
+```
+
+**Output:**
+```json
+{
+  "summary": {
+    "added": 12,
+    "removed": 5,
+    "changed": 8,
+    "unchanged": 234
+  },
+  "changes": [
+    {
+      "findingId": "finding-123",
+      "cve": "CVE-2025-12345",
+      "oldVerdict": "warned",
+      "newVerdict": "blocked",
+      "reason": "rule 'critical-cves' now matches"
+    }
+  ]
+}
+```
+
+### 3.3 Coverage Run
+
+Validate policy against fixture expectations:
+
+```bash
+stella policy coverage \
+  --policy my-policy:v2 \
+  --fixtures fixtures/policy-tests.yaml
+```
+
+---
+
+## 4. Coverage Fixtures
+
+### 4.1 Fixture Format
+
+```yaml
+apiVersion: stellaops.io/policy-test.v1
+kind: PolicyFixture
+metadata:
+  name: critical-cve-blocking
+  policy: my-policy
+
+fixtures:
+  - name: "Block critical CVE in production"
+    input:
+      finding:
+        cve: "CVE-2025-12345"
+        severity: critical
+        ecosystem: npm
+        component: "lodash@4.17.20"
+      context:
+        namespace: production
+        labels:
+          tier: frontend
+    expected:
+      verdict: blocked
+      rulesMatched: ["critical-cves", "production-strict"]
+
+  - name: "Warn on high CVE in staging"
+    input:
+      finding:
+        cve: "CVE-2025-12346"
+        severity: high
+        ecosystem: npm
+    expected:
+      verdict: warned
+
+  - name: "Ignore low CVE with VEX"
+    input:
+      finding:
+        cve: "CVE-2025-12347"
+        severity: low
+        vexStatus: not_affected
+        vexJustification: "component_not_present"
+    expected:
+      verdict: ignored
+```
+
+### 4.2 Fixture Results
+
+```json
+{
+  "total": 25,
+  "passed": 23,
+  "failed": 2,
+  "failures": [
+    {
+      "fixture": "Block critical CVE in production",
+      "expected": {"verdict": "blocked"},
+      "actual": {"verdict": "warned"},
+      "diff": "rule 'critical-cves' did not match due to missing label"
+    }
+  ]
+}
+```
+
+---
+
+## 5. Promotion Gates
+
+### 5.1 Gate Requirements
+
+Before a policy can be promoted from draft to active:
+
+| Gate | Requirement | Enforcement |
+|------|-------------|-------------|
+| Shadow Run | Complete without errors | Required |
+| Coverage | 100% fixtures pass | Required |
+| Diff Review | Changes reviewed | Optional |
+| Approval | Human sign-off | Configurable |
+
+### 5.2 Promotion Workflow
+
+```mermaid
+stateDiagram-v2
+    [*] --> Draft
+    Draft --> Shadow: Start shadow run
+    Shadow --> Coverage: Run coverage tests
+    Coverage --> Review: Pass fixtures
+    Review --> Approval: Review diff
+    Approval --> Active: Approve
+    Coverage --> Draft: Fix failures
+    Approval --> Draft: Reject
+```
+
+### 5.3 CLI Commands
+
+```bash
+# Start shadow run
+stella policy promote start --policy my-policy:v2
+
+# Check promotion status
+stella policy promote status --policy my-policy:v2
+
+# Complete promotion (requires approval)
+stella policy promote complete --policy my-policy:v2 --comment "Reviewed and approved"
+```
+
+---
+
+## 6. Determinism Requirements
+
+### 6.1 Simulation Guarantees
+
+| Property | Guarantee |
+|----------|-----------|
+| Input ordering | Stable sort by (tenant, policyId, findingKey) |
+| Rule evaluation | First-match semantics |
+| Timestamp handling | Injected TimeProvider |
+| Random values | Injected IRandom |
+
+### 6.2 Replay Hash
+
+Each simulation computes:
+```
+determinismHash = SHA256(policyVersion + inputsHash + rulesHash)
+```
+
+Replays with same hash must produce identical results.
+
+---
+
+## 7. Implementation Strategy
+
+### 7.1 Phase 1: Shadow Runs (Complete)
+
+- [x] Shadow collection isolation
+- [x] Shadow metrics tagging
+- [x] Shadow run API
+- [x] CLI integration
+
+### 7.2 Phase 2: Diff & Coverage (In Progress)
+
+- [x] Policy diff algorithm
+- [x] Diff visualization
+- [ ] Coverage fixture parser (POLICY-COV-50-001)
+- [ ] Coverage runner (POLICY-COV-50-002)
+
+### 7.3 Phase 3: Promotion Gates (Planned)
+
+- [ ] Gate configuration schema
+- [ ] Promotion state machine
+- [ ] Approval workflow integration
+- [ ] Console UI for review
+
+---
+
+## 8. API Surface
+
+### 8.1 Simulation APIs
+
+| Endpoint | Method | Scope | Description |
+|----------|--------|-------|-------------|
+| `/api/policy/simulate` | POST | `policy:simulate` | Start simulation |
+| `/api/policy/simulate/{id}` | GET | `policy:read` | Get simulation status |
+| `/api/policy/simulate/{id}/results` | GET | `policy:read` | Get results |
+
+### 8.2 Diff APIs
+
+| Endpoint | Method | Scope | Description |
+|----------|--------|-------|-------------|
+| `/api/policy/diff` | POST | `policy:read` | Compare versions |
+
+### 8.3 Coverage APIs
+
+| Endpoint | Method | Scope | Description |
+|----------|--------|-------|-------------|
+| `/api/policy/coverage` | POST | `policy:simulate` | Run coverage |
+| `/api/policy/coverage/{id}` | GET | `policy:read` | Get results |
+
+### 8.4 Promotion APIs
+
+| Endpoint | Method | Scope | Description |
+|----------|--------|-------|-------------|
+| `/api/policy/promote` | POST | `policy:promote` | Start promotion |
+| `/api/policy/promote/{id}` | GET | `policy:read` | Get status |
+| `/api/policy/promote/{id}/approve` | POST | `policy:approve` | Approve promotion |
+| `/api/policy/promote/{id}/reject` | POST | `policy:approve` | Reject promotion |
+
+---
+
+## 9. Storage Model
+
+### 9.1 Collections
+
+| Collection | Purpose |
+|------------|---------|
+| `policy_simulations` | Simulation records |
+| `policy_simulation_results` | Per-finding results |
+| `policy_coverage_runs` | Coverage executions |
+| `policy_promotions` | Promotion state |
+
+### 9.2 Shadow Isolation
+
+Shadow results stored in separate collections:
+- `effective_finding_{policyId}_shadow`
+- Never mixed with production data
+- TTL-based cleanup (default 7 days)
+
+---
+
+## 10. Observability
+
+### 10.1 Metrics
+
+- `policy_simulation_duration_seconds{mode}`
+- `policy_coverage_pass_rate{policy}`
+- `policy_promotion_gate_status{gate,status}`
+- `policy_diff_changes_total{changeType}`
+
+### 10.2 Audit Events
+
+- `policy.simulation.started`
+- `policy.simulation.completed`
+- `policy.coverage.passed`
+- `policy.coverage.failed`
+- `policy.promotion.approved`
+- `policy.promotion.rejected`
+
+---
+
+## 11. Console Integration
+
+### 11.1 Policy Editor
+
+- Inline simulation button
+- Real-time diff preview
+- Coverage status badge
+
+### 11.2 Promotion Dashboard
+
+- Pending promotions list
+- Gate status visualization
+- Approval/reject actions
+
+---
+
+## 12. Related Documentation
+
+| Resource | Location |
+|----------|----------|
+| Policy architecture | `docs/modules/policy/architecture.md` |
+| DSL reference | `docs/policy/dsl.md` |
+| Lifecycle guide | `docs/policy/lifecycle.md` |
+| Runtime guide | `docs/policy/runtime.md` |
+
+---
+
+## 13. Sprint Mapping
+
+- **Primary Sprint:** SPRINT_0185_0001_0001_policy_simulation.md (NEW)
+- **Related Sprints:**
+  - SPRINT_0120_0000_0001_policy_reasoning.md
+  - SPRINT_0121_0001_0001_policy_reasoning.md
+
+**Key Task IDs:**
+- `POLICY-SIM-40-001` - Shadow runs (DONE)
+- `POLICY-DIFF-41-001` - Diff algorithm (DONE)
+- `POLICY-COV-50-001` - Coverage fixtures (IN PROGRESS)
+- `POLICY-COV-50-002` - Coverage runner (IN PROGRESS)
+- `POLICY-PROM-55-001` - Promotion gates (TODO)
+
+---
+
+## 14. Success Metrics
+
+| Metric | Target |
+|--------|--------|
+| Simulation latency | < 2 min (10k findings) |
+| Coverage accuracy | 100% fixture matching |
+| Promotion gate enforcement | 100% adherence |
+| Shadow isolation | Zero production leakage |
+| Replay determinism | 100% hash match |
+
+---
+
+*Last updated: 2025-11-29*