Files
git.stella-ops.org/docs/faq/policy-faq.md
master 15b4a1de6a feat: Document completed tasks for KMS, Cryptography, and Plugin Libraries
- Added detailed task completion records for KMS interface implementation and CLI support for file-based keys.
- Documented security enhancements including Argon2id password hashing, audit event contracts, and rate limiting configurations.
- Included scoped service support and integration updates for the Plugin platform, ensuring proper DI handling and testing coverage.
2025-10-31 14:37:45 +02:00

97 lines
5.5 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Policy Engine FAQ
Answers to questions that Support, Ops, and Policy Guild teams receive most frequently. Pair this FAQ with the [Policy Lifecycle](../policy/lifecycle.md), [Runs](../policy/runs.md), and [CLI guide](../modules/cli/guides/policy.md) for deeper explanations.
---
## Authoring & DSL
**Q:** *Lint succeeds locally, but submit still fails with `ERR_POL_001`. Why?*
**A:** The CLI requires lint & compile artefacts newer than 24hours. Re-run `stella policy lint` and `stella policy compile` before submitting; ensure you upload the latest diff files with `--attach`.
**Q:** *How do I layer tenant-specific overrides on top of the baseline policy?*
**A:** Keep the baseline in `tenant-global`. For tenant overrides, create a policy referencing the baseline via CLI (`stella policy new --from baseline@<version>`), then adjust rules. Activation is per tenant.
**Q:** *Can I import YAML/Rego policies from earlier releases?*
**A:** No direct import. Use the migration script (`stella policy migrate legacy.yaml`) which outputs `stella-dsl@1` skeletons. Review manually before submission.
---
## Simulation & Determinism
**Q:** *Simulation shows huge differences even though I only tweaked metadata. What did I miss?*
**A:** Check if your simulation used the same SBOM set/env as previous runs. CLI default uses golden fixtures; UI can store custom presets. Large diffs may also indicate Concelier updates; compare advisory cursors in the Simulation tab.
**Q:** *How do we guard against non-deterministic behaviour?*
**A:** CI runs `policy simulate` twice with identical inputs and compares outputs (`DEVOPS-POLICY-20-003`). Any difference fails the pipeline. Locally you can use `stella policy run replay` to verify determinism.
**Q:** *What happens if the determinism guard (`ERR_POL_004`) triggers?*
**A:** Policy Engine halts the run, raises `policy.run.failed` with code `ERR_POL_004`, and switches to incident mode (100% sampling). Review recent code changes; often caused by new helpers that call `DateTime.Now` or non-allowlisted HTTP clients.
---
## VEX & Suppressions
**Q:** *A vendor marked a CVE `not_affected` but the policy still blocks. Why?*
**A:** Check the required justifications. Baseline policy only accepts `component_not_present` and `vulnerable_code_not_present`. Other statuses need explicit rules. Use `stella findings explain` to see which VEX statement was considered.
**Q:** *Can we quiet a finding indefinitely?*
**A:** Avoid indefinite quiets. Policy DSL requires an `until` timestamp. If the use case is permanent, move the rule into baseline logic with strong justification and documentation.
**Q:** *How do we detect overuse of suppressions?*
**A:** Observability exports `policy_suppressions_total` and CLI `stella policy stats`. Review weekly; Support flags tenants whose suppressions grow faster than remediation tickets.
---
## Runs & Operations
**Q:** *Incremental runs are backlogged. What should we check first?*
**A:** Inspect `policy_run_queue_depth` and `policy_delta_backlog_age_seconds` dashboards. If queue depth high, scale worker replicas or investigate upstream change storms (Concelier/Excititor). Use `stella policy run list --status failed` for recent errors.
**Q:** *Full runs take longer than 30min. Is that a breach?*
**A:** Goal is ≤30min, but large tenants may exceed temporarily. Ensure Mongo indexes are current and that worker nodes meet sizing (4vCPU). Consider sharding runs by SBOM group.
**Q:** *How do I replay a run for audit evidence?*
**A:** `stella policy run replay <runId> --output replay.tgz` produces a sealed bundle. Upload to evidence locker or attach to incident tickets.
---
## Approvals & Governance
**Q:** *Can authors approve their own policies?*
**A:** No. Authority denies approval if `approved_by == submitted_by`. Assign at least two reviewers (one security, one product).
**Q:** *What scopes do bots need for CI pipelines?*
**A:** Typically `policy:read`, `policy:simulate`, `policy:runs`. Only grant `policy:run` if the pipeline should trigger runs. Never give CI tokens `policy:approve`.
**Q:** *How do we manage policies in air-gapped deployments?*
**A:** Use `stella policy bundle export --sealed` on a connected site, transfer via approved media, then `stella policy bundle import` inside the enclave. Enable `--sealed` flag in CLI/UI to block accidental outbound calls.
---
## Troubleshooting
**Q:** *API calls return `403` despite valid token.*
**A:** Verify scope includes the specific operation (`policy:activate` vs `policy:run`). Check tenant header matches token tenant. Inspect Authority logs for denial reason (`policy_scope_denied_total` metric).
**Q:** *`stella policy run` exits with code `30`.*
**A:** Network/transport error. Check connectivity to Policy Engine endpoint, TLS configuration, and CLI proxy settings.
**Q:** *Explain drawer shows no VEX data.*
**A:** Either no VEX statement matched or the tenant lacks `findings:read` scope. If VEX should exist, confirm Excititor ingestion and policy joiners (see Observability dashboards).
---
## Compliance Checklist
- [ ] FAQ linked from Console help menu and CLI `stella policy help`.
- [ ] Entries reviewed quarterly by Policy & Support Guilds.
- [ ] Answers cross-reference lifecycle, runs, observability, and governance docs.
- [ ] Incident/Escalation contact details kept current in Support playbooks.
- [ ] FAQ translated for supported locales (if applicable).
---
*Last updated: 2025-10-26 (Sprint 20).*