house keeping work

This commit is contained in:
StellaOps Bot
2025-12-19 22:19:08 +02:00
parent 91f3610b9d
commit 5b57b04484
64 changed files with 4702 additions and 4 deletions

View File

@@ -0,0 +1,268 @@
## 1) Product direction: make “Unknowns” a first-class risk primitive
### Nonnegotiable product principles
1. **Unknowns are not suppressed findings**
* They are a distinct state with distinct governance.
2. **Unknowns must be policy-addressable**
* If policy cannot block or allow them explicitly, the feature is incomplete.
3. **Unknowns must be attested**
* Every signed decision must carry “what we dont know” in a machine-readable way.
4. **Unknowns must be default-on**
* Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”
### Definition: what counts as an “unknown”
PMs must ensure that “unknown” is not vague. Define **reason-coded unknowns**, for example:
* **U-RCH**: Reachability unknown (call path indeterminate)
* **U-ID**: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
* **U-PROV**: Provenance unknown (cannot map binary → source/build)
* **U-VEX**: VEX conflict or missing applicability statement
* **U-FEED**: Knowledge source missing (offline feed gaps, mirror stale)
* **U-CONFIG**: Config/runtime gate unknown (feature flag not observable)
* **U-ANALYZER**: Analyzer limitation (language/framework unsupported)
Each unknown must have:
* `reason_code` (one of a stable enum)
* `scope` (component, binary, symbol, package, image, repo)
* `evidence_refs` (what we inspected)
* `assumptions` (what would need to be true/false)
* `remediation_hint` (how to reduce unknown)
**Acceptance criterion:** every unknown surfaced to users can be traced to a reason code and remediation hint.
---
## 2) Policy direction: “unknown budgets” must be enforceable and environment-aware
### Policy model requirements
Policy must support:
* Thresholds by environment (dev/test/stage/prod)
* Thresholds by unknown type (reachability vs provenance vs feed, etc.)
* Severity weighting (e.g., unknown on internet-facing service is worse)
* Exception workflow (time-bound, owner-bound)
* Deterministic evaluation (same inputs → same result)
### Recommended default policy posture (ship as opinionated defaults)
These defaults are intentionally strict in prod:
**Prod (default)**
* `unknown_reachable == 0` (fail build/deploy)
* `unknown_provenance == 0` (fail)
* `unknown_total <= 3` (fail if exceeded)
* `unknown_feed == 0` (fail; “we didnt have data” is unacceptable for prod)
**Stage**
* `unknown_reachable <= 1`
* `unknown_provenance <= 1`
* `unknown_total <= 10`
**Dev**
* Never hard fail by default; warn + ticket/PR annotation
* Still compute unknowns and show trendlines (so teams see drift)
### Exception policy (required to avoid “disable unknowns” pressure)
Implement **explicit exceptions** rather than toggles:
* Exception must include: `owner`, `expiry`, `justification`, `scope`, `risk_ack`
* Exception must be emitted into attestations and reports (“this passed with exception X”).
**Acceptance criterion:** there is no “turn off unknowns” knob; only thresholds and expiring exceptions.
---
## 3) Reporting direction: unknowns must be visible, triaged, and trendable
### Required reporting surfaces
1. **Release / PR report**
* Unknown summary at top:
* total unknowns
* unknowns by reason code
* unknowns blocking policy vs not
* “What changed?” vs previous baseline (unknown delta)
2. **Dashboard (portfolio view)**
* Unknowns over time
* Top teams/services by unknown count
* Top unknown causes (reason codes)
3. **Operational triage view**
* “Unknown queue” sortable by:
* environment impact (prod/stage)
* exposure class (internet-facing/internal)
* reason code
* last-seen time
* owner
### Reporting should drive action, not anxiety
Every unknown row must include:
* Why its unknown (reason code + short explanation)
* What evidence is missing
* How to reduce unknown (concrete steps)
* Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)
**Key PM instruction:** treat unknowns like an **SLO**. Teams should be able to commit to “unknowns in prod must trend to zero.”
---
## 4) Attestations direction: unknowns must be cryptographically bound to decisions
Every signed decision/attestation must include an “unknowns summary” section.
### Attestation requirements
Include at minimum:
* `unknown_total`
* `unknown_by_reason_code` (map of reason→count)
* `unknown_blocking_count`
* `unknown_details_digest` (hash of the full list if too large)
* `policy_thresholds_applied` (the exact thresholds used)
* `exceptions_applied` (IDs + expiries)
* `knowledge_snapshot_id` (feeds/policy bundle hash if you support offline snapshots)
**Why this matters:** if you sign a “pass,” you must also sign what you *didnt know* at the time. Otherwise the signature is not audit-grade.
**Acceptance criterion:** any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).
---
## 5) Development direction: implement unknown propagation as a first-class data flow
### Core engineering tasks (must be done in this order)
#### A. Define the canonical “Tri-state” evaluation type
For any security claim, the evaluator must return:
* `TRUE` (evidence supports)
* `FALSE` (evidence refutes)
* `UNKNOWN` (insufficient evidence)
Do not represent unknown as nulls or missing fields. It must be explicit.
#### B. Build the unknown aggregator and reason-code framework
* A single aggregation layer computes:
* unknown counts per scope
* unknown counts per reason code
* unknown “blockers” based on policy
* This must be deterministic and stable (no random ordering, stable IDs).
#### C. Ensure analyzers emit unknowns instead of silently failing
Any analyzer that cannot conclude must emit:
* `UNKNOWN` + reason code + evidence pointers
Examples:
* call graph incomplete → `U-RCH`
* stripped binary cannot map symbols → `U-PROV`
* unsupported language → `U-ANALYZER`
#### D. Provide “reduce unknown” instrumentation hooks
Attach remediation metadata:
* “add build flags …”
* “upload debug symbols …”
* “enable source mapping …”
* “mirror feeds …”
This is how you prevent user backlash.
---
## 6) Make it default rather than optional: rollout plan without breaking adoption
### Phase 1: compute + display (no blocking)
* Unknowns computed for all scans
* Reports show unknown budgets and what would have failed in prod
* Collect baseline metrics for 24 weeks of typical usage
### Phase 2: soft gating
* In prod-like pipelines: fail only on `unknown_reachable > 0`
* Everything else warns + requires owner acknowledgement
### Phase 3: full policy enforcement
* Enforce default thresholds
* Exceptions require expiry and are visible in attestations
### Phase 4: governance integration
* Unknowns become part of:
* release readiness checks
* quarterly risk reviews
* vendor compliance audits
**Dev Manager instruction:** invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.
---
## 7) “Definition of Done” checklist for PMs and Dev Managers
### PM DoD
* [ ] Unknowns are explicitly defined with stable reason codes
* [ ] Policy can fail on unknowns with environment-scoped thresholds
* [ ] Reports show unknown deltas and remediation guidance
* [ ] Exceptions are time-bound and appear everywhere (UI + API + attestations)
* [ ] Unknowns cannot be disabled; only thresholds/exceptions are configurable
### Engineering DoD
* [ ] Tri-state evaluation implemented end-to-end
* [ ] Analyzer failures never disappear; they become unknowns
* [ ] Unknown aggregation is deterministic and reproducible
* [ ] Signed attestation includes unknown summary + policy thresholds + exceptions
* [ ] CI/CD integration can enforce “fail if unknowns > N in prod”
---
## 8) Concrete policy examples you can standardize internally
### Minimal policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
### Balanced policy (prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_provenance > 0`
* OR `unknown_total > 3`
### Risk-sensitive policy (internet-facing prod)
* Block deploy if:
* `unknown_reachable > 0`
* OR `unknown_total > 1`
* OR any unknown affects a component with known remotely-exploitable CVEs