up
This commit is contained in:
@@ -0,0 +1,394 @@
|
||||
# Policy Simulation and Shadow Gates
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, simulation semantics, and implementation strategy for Policy Engine simulation features, covering shadow runs, coverage fixtures, and promotion gates.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Policy simulation enables **safe testing of policy changes** before production deployment. Key capabilities:
|
||||
|
||||
- **Shadow Runs** - Execute policies without enforcement
|
||||
- **Diff Summaries** - Compare old vs new policy outcomes
|
||||
- **Coverage Fixtures** - Validate expected findings
|
||||
- **Promotion Gates** - Block promotion until tests pass
|
||||
- **Deterministic Replay** - Reproduce simulation results
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Simulation Requirements | Use Case |
|
||||
|---------|------------------------|----------|
|
||||
| **Policy Authors** | Preview changes | Development workflow |
|
||||
| **Security Leads** | Approve promotions | Change management |
|
||||
| **Compliance** | Audit trail | Policy change evidence |
|
||||
| **DevSecOps** | CI integration | Automated testing |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools lack policy simulation. Stella Ops differentiates with:
|
||||
- **Shadow execution** without production impact
|
||||
- **Diff visualization** of policy changes
|
||||
- **Coverage testing** with fixture validation
|
||||
- **Promotion gates** for governance
|
||||
- **Deterministic replay** for audit
|
||||
|
||||
---
|
||||
|
||||
## 3. Simulation Modes
|
||||
|
||||
### 3.1 Shadow Run
|
||||
|
||||
Execute policy against real data without enforcement:
|
||||
|
||||
```bash
|
||||
stella policy simulate \
|
||||
--policy my-policy:v2 \
|
||||
--scope "tenant:acme-corp,namespace:production" \
|
||||
--shadow
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Evaluates all findings
|
||||
- Records verdicts to shadow collections
|
||||
- No enforcement actions
|
||||
- No notifications triggered
|
||||
- Metrics tagged with `shadow=true`
|
||||
|
||||
### 3.2 Diff Run
|
||||
|
||||
Compare two policy versions:
|
||||
|
||||
```bash
|
||||
stella policy diff \
|
||||
--old my-policy:v1 \
|
||||
--new my-policy:v2 \
|
||||
--scope "tenant:acme-corp"
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```json
|
||||
{
|
||||
"summary": {
|
||||
"added": 12,
|
||||
"removed": 5,
|
||||
"changed": 8,
|
||||
"unchanged": 234
|
||||
},
|
||||
"changes": [
|
||||
{
|
||||
"findingId": "finding-123",
|
||||
"cve": "CVE-2025-12345",
|
||||
"oldVerdict": "warned",
|
||||
"newVerdict": "blocked",
|
||||
"reason": "rule 'critical-cves' now matches"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Coverage Run
|
||||
|
||||
Validate policy against fixture expectations:
|
||||
|
||||
```bash
|
||||
stella policy coverage \
|
||||
--policy my-policy:v2 \
|
||||
--fixtures fixtures/policy-tests.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Coverage Fixtures
|
||||
|
||||
### 4.1 Fixture Format
|
||||
|
||||
```yaml
|
||||
apiVersion: stellaops.io/policy-test.v1
|
||||
kind: PolicyFixture
|
||||
metadata:
|
||||
name: critical-cve-blocking
|
||||
policy: my-policy
|
||||
|
||||
fixtures:
|
||||
- name: "Block critical CVE in production"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12345"
|
||||
severity: critical
|
||||
ecosystem: npm
|
||||
component: "lodash@4.17.20"
|
||||
context:
|
||||
namespace: production
|
||||
labels:
|
||||
tier: frontend
|
||||
expected:
|
||||
verdict: blocked
|
||||
rulesMatched: ["critical-cves", "production-strict"]
|
||||
|
||||
- name: "Warn on high CVE in staging"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12346"
|
||||
severity: high
|
||||
ecosystem: npm
|
||||
expected:
|
||||
verdict: warned
|
||||
|
||||
- name: "Ignore low CVE with VEX"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12347"
|
||||
severity: low
|
||||
vexStatus: not_affected
|
||||
vexJustification: "component_not_present"
|
||||
expected:
|
||||
verdict: ignored
|
||||
```
|
||||
|
||||
### 4.2 Fixture Results
|
||||
|
||||
```json
|
||||
{
|
||||
"total": 25,
|
||||
"passed": 23,
|
||||
"failed": 2,
|
||||
"failures": [
|
||||
{
|
||||
"fixture": "Block critical CVE in production",
|
||||
"expected": {"verdict": "blocked"},
|
||||
"actual": {"verdict": "warned"},
|
||||
"diff": "rule 'critical-cves' did not match due to missing label"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Promotion Gates
|
||||
|
||||
### 5.1 Gate Requirements
|
||||
|
||||
Before a policy can be promoted from draft to active:
|
||||
|
||||
| Gate | Requirement | Enforcement |
|
||||
|------|-------------|-------------|
|
||||
| Shadow Run | Complete without errors | Required |
|
||||
| Coverage | 100% fixtures pass | Required |
|
||||
| Diff Review | Changes reviewed | Optional |
|
||||
| Approval | Human sign-off | Configurable |
|
||||
|
||||
### 5.2 Promotion Workflow
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Draft
|
||||
Draft --> Shadow: Start shadow run
|
||||
Shadow --> Coverage: Run coverage tests
|
||||
Coverage --> Review: Pass fixtures
|
||||
Review --> Approval: Review diff
|
||||
Approval --> Active: Approve
|
||||
Coverage --> Draft: Fix failures
|
||||
Approval --> Draft: Reject
|
||||
```
|
||||
|
||||
### 5.3 CLI Commands
|
||||
|
||||
```bash
|
||||
# Start shadow run
|
||||
stella policy promote start --policy my-policy:v2
|
||||
|
||||
# Check promotion status
|
||||
stella policy promote status --policy my-policy:v2
|
||||
|
||||
# Complete promotion (requires approval)
|
||||
stella policy promote complete --policy my-policy:v2 --comment "Reviewed and approved"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Determinism Requirements
|
||||
|
||||
### 6.1 Simulation Guarantees
|
||||
|
||||
| Property | Guarantee |
|
||||
|----------|-----------|
|
||||
| Input ordering | Stable sort by (tenant, policyId, findingKey) |
|
||||
| Rule evaluation | First-match semantics |
|
||||
| Timestamp handling | Injected TimeProvider |
|
||||
| Random values | Injected IRandom |
|
||||
|
||||
### 6.2 Replay Hash
|
||||
|
||||
Each simulation computes:
|
||||
```
|
||||
determinismHash = SHA256(policyVersion + inputsHash + rulesHash)
|
||||
```
|
||||
|
||||
Replays with same hash must produce identical results.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Strategy
|
||||
|
||||
### 7.1 Phase 1: Shadow Runs (Complete)
|
||||
|
||||
- [x] Shadow collection isolation
|
||||
- [x] Shadow metrics tagging
|
||||
- [x] Shadow run API
|
||||
- [x] CLI integration
|
||||
|
||||
### 7.2 Phase 2: Diff & Coverage (In Progress)
|
||||
|
||||
- [x] Policy diff algorithm
|
||||
- [x] Diff visualization
|
||||
- [ ] Coverage fixture parser (POLICY-COV-50-001)
|
||||
- [ ] Coverage runner (POLICY-COV-50-002)
|
||||
|
||||
### 7.3 Phase 3: Promotion Gates (Planned)
|
||||
|
||||
- [ ] Gate configuration schema
|
||||
- [ ] Promotion state machine
|
||||
- [ ] Approval workflow integration
|
||||
- [ ] Console UI for review
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Simulation APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/simulate` | POST | `policy:simulate` | Start simulation |
|
||||
| `/api/policy/simulate/{id}` | GET | `policy:read` | Get simulation status |
|
||||
| `/api/policy/simulate/{id}/results` | GET | `policy:read` | Get results |
|
||||
|
||||
### 8.2 Diff APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/diff` | POST | `policy:read` | Compare versions |
|
||||
|
||||
### 8.3 Coverage APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/coverage` | POST | `policy:simulate` | Run coverage |
|
||||
| `/api/policy/coverage/{id}` | GET | `policy:read` | Get results |
|
||||
|
||||
### 8.4 Promotion APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/promote` | POST | `policy:promote` | Start promotion |
|
||||
| `/api/policy/promote/{id}` | GET | `policy:read` | Get status |
|
||||
| `/api/policy/promote/{id}/approve` | POST | `policy:approve` | Approve promotion |
|
||||
| `/api/policy/promote/{id}/reject` | POST | `policy:approve` | Reject promotion |
|
||||
|
||||
---
|
||||
|
||||
## 9. Storage Model
|
||||
|
||||
### 9.1 Collections
|
||||
|
||||
| Collection | Purpose |
|
||||
|------------|---------|
|
||||
| `policy_simulations` | Simulation records |
|
||||
| `policy_simulation_results` | Per-finding results |
|
||||
| `policy_coverage_runs` | Coverage executions |
|
||||
| `policy_promotions` | Promotion state |
|
||||
|
||||
### 9.2 Shadow Isolation
|
||||
|
||||
Shadow results stored in separate collections:
|
||||
- `effective_finding_{policyId}_shadow`
|
||||
- Never mixed with production data
|
||||
- TTL-based cleanup (default 7 days)
|
||||
|
||||
---
|
||||
|
||||
## 10. Observability
|
||||
|
||||
### 10.1 Metrics
|
||||
|
||||
- `policy_simulation_duration_seconds{mode}`
|
||||
- `policy_coverage_pass_rate{policy}`
|
||||
- `policy_promotion_gate_status{gate,status}`
|
||||
- `policy_diff_changes_total{changeType}`
|
||||
|
||||
### 10.2 Audit Events
|
||||
|
||||
- `policy.simulation.started`
|
||||
- `policy.simulation.completed`
|
||||
- `policy.coverage.passed`
|
||||
- `policy.coverage.failed`
|
||||
- `policy.promotion.approved`
|
||||
- `policy.promotion.rejected`
|
||||
|
||||
---
|
||||
|
||||
## 11. Console Integration
|
||||
|
||||
### 11.1 Policy Editor
|
||||
|
||||
- Inline simulation button
|
||||
- Real-time diff preview
|
||||
- Coverage status badge
|
||||
|
||||
### 11.2 Promotion Dashboard
|
||||
|
||||
- Pending promotions list
|
||||
- Gate status visualization
|
||||
- Approval/reject actions
|
||||
|
||||
---
|
||||
|
||||
## 12. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Policy architecture | `docs/modules/policy/architecture.md` |
|
||||
| DSL reference | `docs/policy/dsl.md` |
|
||||
| Lifecycle guide | `docs/policy/lifecycle.md` |
|
||||
| Runtime guide | `docs/policy/runtime.md` |
|
||||
|
||||
---
|
||||
|
||||
## 13. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0185_0001_0001_policy_simulation.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0120_0000_0001_policy_reasoning.md
|
||||
- SPRINT_0121_0001_0001_policy_reasoning.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `POLICY-SIM-40-001` - Shadow runs (DONE)
|
||||
- `POLICY-DIFF-41-001` - Diff algorithm (DONE)
|
||||
- `POLICY-COV-50-001` - Coverage fixtures (IN PROGRESS)
|
||||
- `POLICY-COV-50-002` - Coverage runner (IN PROGRESS)
|
||||
- `POLICY-PROM-55-001` - Promotion gates (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 14. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Simulation latency | < 2 min (10k findings) |
|
||||
| Coverage accuracy | 100% fixture matching |
|
||||
| Promotion gate enforcement | 100% adherence |
|
||||
| Shadow isolation | Zero production leakage |
|
||||
| Replay determinism | 100% hash match |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Reference in New Issue
Block a user