5.5 KiB
Policy Engine FAQ
Answers to questions that Support, Ops, and Policy Guild teams receive most frequently. Pair this FAQ with the Policy Lifecycle, Runs, and CLI guide for deeper explanations.
Authoring & DSL
Q: Lint succeeds locally, but submit still fails with ERR_POL_001. Why?
A: The CLI requires lint & compile artefacts newer than 24 hours. Re-run stella policy lint and stella policy compile before submitting; ensure you upload the latest diff files with --attach.
Q: How do I layer tenant-specific overrides on top of the baseline policy?
A: Keep the baseline in tenant-global. For tenant overrides, create a policy referencing the baseline via CLI (stella policy new --from baseline@<version>), then adjust rules. Activation is per tenant.
Q: Can I import YAML/Rego policies from earlier releases?
A: No direct import. Use the migration script (stella policy migrate legacy.yaml) which outputs stella-dsl@1 skeletons. Review manually before submission.
Simulation & Determinism
Q: Simulation shows huge differences even though I only tweaked metadata. What did I miss?
A: Check if your simulation used the same SBOM set/env as previous runs. CLI default uses golden fixtures; UI can store custom presets. Large diffs may also indicate Concelier updates; compare advisory cursors in the Simulation tab.
Q: How do we guard against non-deterministic behaviour?
A: CI runs policy simulate twice with identical inputs and compares outputs (DEVOPS-POLICY-20-003). Any difference fails the pipeline. Locally you can use stella policy run replay to verify determinism.
Q: What happens if the determinism guard (ERR_POL_004) triggers?
A: Policy Engine halts the run, raises policy.run.failed with code ERR_POL_004, and switches to incident mode (100 % sampling). Review recent code changes; often caused by new helpers that call DateTime.Now or non-allowlisted HTTP clients.
VEX & Suppressions
Q: A vendor marked a CVE not_affected but the policy still blocks. Why?
A: Check the required justifications. Baseline policy only accepts component_not_present and vulnerable_code_not_present. Other statuses need explicit rules. Use stella findings explain to see which VEX statement was considered.
Q: Can we quiet a finding indefinitely?
A: Avoid indefinite quiets. Policy DSL requires an until timestamp. If the use case is permanent, move the rule into baseline logic with strong justification and documentation.
Q: How do we detect overuse of suppressions?
A: Observability exports policy_suppressions_total and CLI stella policy stats. Review weekly; Support flags tenants whose suppressions grow faster than remediation tickets.
Runs & Operations
Q: Incremental runs are backlogged. What should we check first?
A: Inspect policy_run_queue_depth and policy_delta_backlog_age_seconds dashboards. If queue depth high, scale worker replicas or investigate upstream change storms (Concelier/Excititor). Use stella policy run list --status failed for recent errors.
Q: Full runs take longer than 30 min. Is that a breach?
A: Goal is ≤ 30 min, but large tenants may exceed temporarily. Ensure Mongo indexes are current and that worker nodes meet sizing (4 vCPU). Consider sharding runs by SBOM group.
Q: How do I replay a run for audit evidence?
A: stella policy run replay <runId> --output replay.tgz produces a sealed bundle. Upload to evidence locker or attach to incident tickets.
Approvals & Governance
Q: Can authors approve their own policies?
A: No. Authority denies approval if approved_by == submitted_by. Assign at least two reviewers (one security, one product).
Q: What scopes do bots need for CI pipelines?
A: Typically policy:read, policy:simulate, policy:runs. Only grant policy:run if the pipeline should trigger runs. Never give CI tokens policy:approve.
Q: How do we manage policies in air-gapped deployments?
A: Use stella policy bundle export --sealed on a connected site, transfer via approved media, then stella policy bundle import inside the enclave. Enable --sealed flag in CLI/UI to block accidental outbound calls.
Troubleshooting
Q: API calls return 403 despite valid token.
A: Verify scope includes the specific operation (policy:activate vs policy:run). Check tenant header matches token tenant. Inspect Authority logs for denial reason (policy_scope_denied_total metric).
Q: stella policy run exits with code 30.
A: Network/transport error. Check connectivity to Policy Engine endpoint, TLS configuration, and CLI proxy settings.
Q: Explain drawer shows no VEX data.
A: Either no VEX statement matched or the tenant lacks findings:read scope. If VEX should exist, confirm Excititor ingestion and policy joiners (see Observability dashboards).
Compliance Checklist
- FAQ linked from Console help menu and CLI
stella policy help. - Entries reviewed quarterly by Policy & Support Guilds.
- Answers cross-reference lifecycle, runs, observability, and governance docs.
- Incident/Escalation contact details kept current in Support playbooks.
- FAQ translated for supported locales (if applicable).
Last updated: 2025-10-26 (Sprint 20).