Files
git.stella-ops.org/docs/product-advisories/unprocessed/19-Dec-2025 - Moat #7.md
2025-12-19 22:19:08 +02:00

8.3 KiB
Raw Blame History

1) Product direction: make “Unknowns” a first-class risk primitive

Nonnegotiable product principles

  1. Unknowns are not suppressed findings

    • They are a distinct state with distinct governance.
  2. Unknowns must be policy-addressable

    • If policy cannot block or allow them explicitly, the feature is incomplete.
  3. Unknowns must be attested

    • Every signed decision must carry “what we dont know” in a machine-readable way.
  4. Unknowns must be default-on

    • Users may adjust thresholds, but they must not be able to “turn off unknown tracking.”

Definition: what counts as an “unknown”

PMs must ensure that “unknown” is not vague. Define reason-coded unknowns, for example:

  • U-RCH: Reachability unknown (call path indeterminate)
  • U-ID: Component identity unknown (ambiguous package / missing digest / unresolved PURL)
  • U-PROV: Provenance unknown (cannot map binary → source/build)
  • U-VEX: VEX conflict or missing applicability statement
  • U-FEED: Knowledge source missing (offline feed gaps, mirror stale)
  • U-CONFIG: Config/runtime gate unknown (feature flag not observable)
  • U-ANALYZER: Analyzer limitation (language/framework unsupported)

Each unknown must have:

  • reason_code (one of a stable enum)
  • scope (component, binary, symbol, package, image, repo)
  • evidence_refs (what we inspected)
  • assumptions (what would need to be true/false)
  • remediation_hint (how to reduce unknown)

Acceptance criterion: every unknown surfaced to users can be traced to a reason code and remediation hint.


2) Policy direction: “unknown budgets” must be enforceable and environment-aware

Policy model requirements

Policy must support:

  • Thresholds by environment (dev/test/stage/prod)
  • Thresholds by unknown type (reachability vs provenance vs feed, etc.)
  • Severity weighting (e.g., unknown on internet-facing service is worse)
  • Exception workflow (time-bound, owner-bound)
  • Deterministic evaluation (same inputs → same result)

These defaults are intentionally strict in prod:

Prod (default)

  • unknown_reachable == 0 (fail build/deploy)
  • unknown_provenance == 0 (fail)
  • unknown_total <= 3 (fail if exceeded)
  • unknown_feed == 0 (fail; “we didnt have data” is unacceptable for prod)

Stage

  • unknown_reachable <= 1
  • unknown_provenance <= 1
  • unknown_total <= 10

Dev

  • Never hard fail by default; warn + ticket/PR annotation
  • Still compute unknowns and show trendlines (so teams see drift)

Exception policy (required to avoid “disable unknowns” pressure)

Implement explicit exceptions rather than toggles:

  • Exception must include: owner, expiry, justification, scope, risk_ack
  • Exception must be emitted into attestations and reports (“this passed with exception X”).

Acceptance criterion: there is no “turn off unknowns” knob; only thresholds and expiring exceptions.


3) Reporting direction: unknowns must be visible, triaged, and trendable

Required reporting surfaces

  1. Release / PR report

    • Unknown summary at top:

      • total unknowns
      • unknowns by reason code
      • unknowns blocking policy vs not
    • “What changed?” vs previous baseline (unknown delta)

  2. Dashboard (portfolio view)

    • Unknowns over time
    • Top teams/services by unknown count
    • Top unknown causes (reason codes)
  3. Operational triage view

    • “Unknown queue” sortable by:

      • environment impact (prod/stage)
      • exposure class (internet-facing/internal)
      • reason code
      • last-seen time
      • owner

Reporting should drive action, not anxiety

Every unknown row must include:

  • Why its unknown (reason code + short explanation)
  • What evidence is missing
  • How to reduce unknown (concrete steps)
  • Expected effect (e.g., “adding debug symbols will likely reduce U-RCH by ~X”)

Key PM instruction: treat unknowns like an SLO. Teams should be able to commit to “unknowns in prod must trend to zero.”


4) Attestations direction: unknowns must be cryptographically bound to decisions

Every signed decision/attestation must include an “unknowns summary” section.

Attestation requirements

Include at minimum:

  • unknown_total
  • unknown_by_reason_code (map of reason→count)
  • unknown_blocking_count
  • unknown_details_digest (hash of the full list if too large)
  • policy_thresholds_applied (the exact thresholds used)
  • exceptions_applied (IDs + expiries)
  • knowledge_snapshot_id (feeds/policy bundle hash if you support offline snapshots)

Why this matters: if you sign a “pass,” you must also sign what you didnt know at the time. Otherwise the signature is not audit-grade.

Acceptance criterion: any downstream verifier can reject a signed “pass” based solely on unknown fields (e.g., “reject if unknown_reachable>0 in prod”).


5) Development direction: implement unknown propagation as a first-class data flow

Core engineering tasks (must be done in this order)

A. Define the canonical “Tri-state” evaluation type

For any security claim, the evaluator must return:

  • TRUE (evidence supports)
  • FALSE (evidence refutes)
  • UNKNOWN (insufficient evidence)

Do not represent unknown as nulls or missing fields. It must be explicit.

B. Build the unknown aggregator and reason-code framework

  • A single aggregation layer computes:

    • unknown counts per scope
    • unknown counts per reason code
    • unknown “blockers” based on policy
  • This must be deterministic and stable (no random ordering, stable IDs).

C. Ensure analyzers emit unknowns instead of silently failing

Any analyzer that cannot conclude must emit:

  • UNKNOWN + reason code + evidence pointers Examples:
  • call graph incomplete → U-RCH
  • stripped binary cannot map symbols → U-PROV
  • unsupported language → U-ANALYZER

D. Provide “reduce unknown” instrumentation hooks

Attach remediation metadata:

  • “add build flags …”
  • “upload debug symbols …”
  • “enable source mapping …”
  • “mirror feeds …”

This is how you prevent user backlash.


6) Make it default rather than optional: rollout plan without breaking adoption

Phase 1: compute + display (no blocking)

  • Unknowns computed for all scans
  • Reports show unknown budgets and what would have failed in prod
  • Collect baseline metrics for 24 weeks of typical usage

Phase 2: soft gating

  • In prod-like pipelines: fail only on unknown_reachable > 0
  • Everything else warns + requires owner acknowledgement

Phase 3: full policy enforcement

  • Enforce default thresholds
  • Exceptions require expiry and are visible in attestations

Phase 4: governance integration

  • Unknowns become part of:

    • release readiness checks
    • quarterly risk reviews
    • vendor compliance audits

Dev Manager instruction: invest in tooling that reduces unknowns early (symbol capture, provenance mapping, better analyzers). Otherwise “unknown gating” becomes politically unsustainable.


7) “Definition of Done” checklist for PMs and Dev Managers

PM DoD

  • Unknowns are explicitly defined with stable reason codes
  • Policy can fail on unknowns with environment-scoped thresholds
  • Reports show unknown deltas and remediation guidance
  • Exceptions are time-bound and appear everywhere (UI + API + attestations)
  • Unknowns cannot be disabled; only thresholds/exceptions are configurable

Engineering DoD

  • Tri-state evaluation implemented end-to-end
  • Analyzer failures never disappear; they become unknowns
  • Unknown aggregation is deterministic and reproducible
  • Signed attestation includes unknown summary + policy thresholds + exceptions
  • CI/CD integration can enforce “fail if unknowns > N in prod”

8) Concrete policy examples you can standardize internally

Minimal policy (prod)

  • Block deploy if:

    • unknown_reachable > 0
    • OR unknown_provenance > 0

Balanced policy (prod)

  • Block deploy if:

    • unknown_reachable > 0
    • OR unknown_provenance > 0
    • OR unknown_total > 3

Risk-sensitive policy (internet-facing prod)

  • Block deploy if:

    • unknown_reachable > 0
    • OR unknown_total > 1
    • OR any unknown affects a component with known remotely-exploitable CVEs