Files
git.stella-ops.org/docs/product-advisories/27-Nov-2025 - Managing Ambiguity Through an Unknowns Registry.md
master e950474a77
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
up
2025-11-27 15:16:31 +02:00

23 KiB
Raw Blame History

Heres a crisp, readytoship concept you can drop into StellaOps: an Unknowns Registry that captures ambiguous scanner artifacts (stripped binaries, unverifiable packages, orphaned PURLs, missing digests) and treats them as firstclass citizens with probabilistic severity and trustdecay—so you stay transparent without blocking delivery.

What this solves (in plain terms)

  • No silent drops: every “cant verify / cant resolve” is tracked, not discarded.
  • Quantified risk: unknowns still roll into a portfoliolevel risk number with confidence intervals.
  • Trust over time: stale unknowns get riskier the longer they remain unresolved.
  • Client confidence: visibility + trajectory (are unknowns shrinking?) becomes a maturity signal.

Core data model (CycloneDX/SPDX compatible, attaches to your SBOM spine)

UnknownArtifact:
  id: urn:stella:unknowns:<uuid>
  observedAt: <RFC3339>
  origin:
    source: scanner|ingest|runtime
    feed: <name/version>
    evidence: [ filePath, containerDigest, buildId, sectionHints ]
  identifiers:
    purl?: <string>        # orphan/incomplete PURL allowed
    hash?: <sha256|null>   # missing digest allowed
    cpe?: <string|null>
  classification:
    type: binary|library|package|script|config|other
    reason: stripped_binary|missing_signature|no_feed_match|ambiguous_name|checksum_mismatch|other
  metrics:
    baseUnkScore: 0..1
    confidence: 0..1       # model confidence in the *score*
    trust: 0..1            # provenance trust (sig/attest, feed quality)
    decayPolicyId: <ref>
  resolution:
    status: unresolved|suppressed|mitigated|confirmed-benign|confirmed-risk
    updatedAt: <RFC3339>
    notes: <text>
  links:
    scanId: <ref>
    componentId?: <ref to SBOM component if later mapped>
    attestations?: [ dsse, in-toto, rekorRef ]

Scoring (simple, explainable, deterministic)

  • Unknown Risk (UR): UR_t = clamp( (B * (1 + A)) * D_t * (1 - T) , 0, 1 )

    • B = baseUnkScore (heuristics: file entropy, section hints, ELF flags, import tables, size, location)

    • A = Environment Amplifier (runtime proximity: container entrypoint? PID namespace? network caps?)

    • T = Trust (sig/attest/registry reputation/feed pedigree normalized to 0..1)

    • D_t = Trustdecay multiplier over time t:

      • Linear: D_t = 1 + k * daysOpen (e.g., k = 0.01)
      • or Exponential: D_t = e^(λ * daysOpen) (e.g., λ = 0.005)
  • Portfolio rollup: use P90 of UR_t across images + sum of topN UR_t to avoid dilution.

Policies & SLOs

  • SLO: Unknowns burndown ≤ X% weekoverweek; Median age ≤ Y days.
  • Gates: block promotion when (a) any UR_t ≥ 0.8, or (b) more than M unknowns with age > Z days.
  • Suppressions: require justification + expiry; suppression reduces A but does not zero D_t.

Trustdecay policies (pluggable)

DecayPolicy:
  id: decay:default:v1
  kind: linear|exponential|custom
  params:
    k: 0.01        # linear slope per day
    cap: 2.0       # max multiplier

Scanner hooks (where to emit Unknowns)

  • Binary scan: stripped ELF/MachO/PE; missing buildID; abnormal sections; impossible symbol map.
  • Package map: PURL inferred from path without registry proof; mismatched checksum; vendor fork detected.
  • Attestation: DSSE missing / invalid; Sigstore chain unverifiable; Rekor entry not found.
  • Feeds: component seen in runtime but absent from SBOM (or vice versa).

Deterministic generation (for replay/audits)

  • Include Unknowns in the Scan Manifest (your deterministic bundle): inputs, ruleset hash, feed hashes, lattice policy version, and the exact classifier thresholds that produced B, A, T. That lets you replay and reproduce UR_t byteforbyte during audits.

API surface (StellaOps.Authority)

POST /unknowns/ingest            # bulk ingest from Scanner/Vexer
GET  /unknowns?imageDigest=…     # list + filters (status, age, UR buckets)
PATCH /unknowns/{id}/resolve     # set status, add evidence, set suppression (with expiry)
GET  /unknowns/stats             # burn-downs, age histograms, P90 UR_t, top-N contributors

UI slices (Trust Algebra Studio)

  • Risk ribbon: Unknowns count, P90 UR_t, median age, trend sparkline.
  • Aging board: columns by age buckets; cards show reason, UR_t, T, decay policy, evidence.
  • Whatif slider: adjust k/λ and see retroactive effect on release readiness (deterministic preview).
  • Explainability panel: show B, A, T, D_t factors with succinct evidence (e.g., “ELF stripped; no .symtab; no Sigstore; runtime hits PID 1 → A=0.2; trust=0.1; day 17 → D=1.17”).

Heuristics for baseUnkScore (B) (portable across ELF/PE/MachO)

  • Section/segment anomalies; entropy outliers; import tables linking to risky APIs; executable heap/stack flags.
  • Location & role (PATH proximity to entrypoint, init scripts).
  • Size/type priors (tiny droppers vs fat libs).
  • Knownunknown patterns (packer signatures, UPX traces without unpack attest).

Governance & reporting

  • Weekly Unknowns Report: totals, burndown, median age, “top unresolved origins” (by registry / base image), and “timetofirstevidence.”
  • Vendor scorecards: if a supplier causes recurring unknowns, reflect it in Adaptive Trust Economics.

Minimal implementation slice (2 sprints)

  1. Sprint A (backend + ingest): model + tables, ingest endpoint, deterministic manifest hook, basic scoring (linear decay), list/query API, rollup metrics.
  2. Sprint B (UI + policy): Unknowns board, whatif slider, suppress/resolve workflow w/ expiry, weekly report, policy gates in CI.

If you want, I can generate:

  • the exact PostgreSQL schema + indices (including age/UR materialized views),
  • .NET 10 DTOs, handlers, and a deterministic scoring library with unit tests,
  • a Grafana/ClickHouse dashboard spec for burndown and P90 tracking,
  • and a onepage clientfacing explainer your sales team can use. Perfect, lets turn the concept into a concrete, devready implementation plan.

Ill assume:

  • You already have:

    • A scanner pipeline (binaries, SBOMs, attestations)
    • A backend service (StellaOps.Authority)
    • A UI (Trust Algebra Studio)
    • Observability (OpenTelemetry, ClickHouse/Presto)

You can adapt naming and tech stack as needed.


0. Scope & success criteria

Goals

  1. Persist all “unknown-ish” scanner findings (stripped binaries, unverifiable PURLs, missing digests, etc.) as firstclass entities.
  2. Compute a deterministic Unknown Risk (UR) per artifact and roll it up per image/application.
  3. Apply trustdecay over time and expose burndown metrics.
  4. Provide UI workflows to triage, suppress, and resolve unknowns.
  5. Enforce release gates based on unknown risk and age.

Nongoals (for v1)

  • No full ML; use deterministic heuristics + tunable weights.
  • No crossorg multitenant policy — single org/single policy set.
  • No perdeveloper responsibility/assignment yet (can add later).

1. Architecture & components

1.1 New/updated components

  1. Unknowns Registry (backend submodule)

    • Lives in your existing backend (e.g., StellaOps.Authority.Unknowns).
    • Owns DB schema, scoring logic, and API.
  2. Scanner integration

    • Extend StellaOps.Scanner (and/or Vexer) to emit “unknown” findings into the registry via HTTP or message bus.
  3. UI: Unknowns in Trust Algebra Studio

    • New section/tab: “Unknowns” under each image/app.
    • Global “Unknowns board” for portfolio view.
  4. Analytics & jobs

    • Periodic job to recompute trustdecay & UR.
    • Weekly report generator (e.g., pushing into ClickHouse, Slack, or email).

2. Data model (DB schema)

Use relational DB; heres a concrete schema you can translate into migrations.

2.1 Tables

unknown_artifacts

Represents the current state of each unknown.

  • id (UUID, PK)
  • created_at (timestamp)
  • updated_at (timestamp)
  • first_observed_at (timestamp, NOT NULL)
  • last_observed_at (timestamp, NOT NULL)
  • origin_source (enum: scanner, runtime, ingest)
  • origin_feed (text) e.g., binary-scanner@1.4.3
  • origin_scan_id (UUID / text) foreign key to scan_runs if you have it
  • image_digest (text, indexed) to tie to container/image
  • component_id (UUID, nullable) SBOM component when later mapped
  • file_path (text, nullable)
  • build_id (text, nullable) ELF/Mach-O/PE build ID if any
  • purl (text, nullable)
  • hash_sha256 (text, nullable)
  • cpe (text, nullable)
  • classification_type (enum: binary, library, package, script, config, other)
  • classification_reason (enum: stripped_binary, missing_signature, no_feed_match, ambiguous_name, checksum_mismatch, other)
  • status (enum: unresolved, suppressed, mitigated, confirmed_benign, confirmed_risk)
  • status_changed_at (timestamp)
  • status_changed_by (text / user-id)
  • notes (text)
  • decay_policy_id (FK → decay_policies)
  • base_unk_score (double, 0..1)
  • env_amplifier (double, 0..1)
  • trust (double, 0..1)
  • current_decay_multiplier (double)
  • current_ur (double, 0..1) Unknown Risk at last recompute
  • current_confidence (double, 0..1) confidence in current_ur
  • is_deleted (bool) soft delete

Indexes

  • idx_unknown_artifacts_image_digest_status
  • idx_unknown_artifacts_status_created_at
  • idx_unknown_artifacts_current_ur
  • idx_unknown_artifacts_last_observed_at

unknown_artifact_events

Append-only event log for auditable changes.

  • id (UUID, PK)
  • unknown_artifact_id (FK → unknown_artifacts)
  • created_at (timestamp)
  • actor (text / user-id / system)
  • event_type (enum: created, reobserved, status_changed, note_added, metrics_recomputed, linked_component, suppression_applied, suppression_expired)
  • payload (JSONB) diff or eventspecific details

Index: idx_unknown_artifact_events_artifact_id_created_at

decay_policies

Defines how trustdecay works.

  • id (text, PK) e.g., decay:default:v1
  • kind (enum: linear, exponential)
  • param_k (double, nullable) for linear: slope
  • param_lambda (double, nullable) for exponential
  • cap (double, default 2.0)
  • description (text)
  • is_default (bool)

unknown_suppressions

Optional; can also reuse unknown_artifacts.status but separate table lets you have multiple suppressions over time.

  • id (UUID, PK)
  • unknown_artifact_id (FK)
  • created_at (timestamp)
  • created_by (text)
  • reason (text)
  • expires_at (timestamp, nullable)
  • active (bool)

Index: idx_unknown_suppressions_artifact_active_expires_at

unknown_image_rollups

Precomputed rollups per image (for fast dashboards/gates).

  • id (UUID, PK)
  • image_digest (text, indexed)
  • computed_at (timestamp)
  • unknown_count_total (int)
  • unknown_count_unresolved (int)
  • unknown_count_high_ur (int) e.g., UR ≥ 0.8
  • p50_ur (double)
  • p90_ur (double)
  • top_n_ur_sum (double)
  • median_age_days (double)

3. Scoring engine implementation

Create a small, deterministic scoring library so the same code can be used in:

  • Backend ingest path (for immediate UR)
  • Batch recompute job
  • “Whatif” UI simulations (optionally via stateless API)

3.1 Data types

Define a core model, e.g.:

type UnknownMetricsInput = {
  baseUnkScore: number;  // B
  envAmplifier: number;  // A
  trust: number;         // T
  daysOpen: number;      // t
  decayPolicy: {
    kind: "linear" | "exponential";
    k?: number;
    lambda?: number;
    cap: number;
  };
};

type UnknownMetricsOutput = {
  decayMultiplier: number; // D_t
  unknownRisk: number;     // UR_t
};

3.2 Algorithm

function computeDecayMultiplier(
  daysOpen: number,
  policy: DecayPolicy
): number {
  if (policy.kind === "linear") {
    const raw = 1 + (policy.k ?? 0) * daysOpen;
    return Math.min(raw, policy.cap);
  }
  if (policy.kind === "exponential") {
    const lambda = policy.lambda ?? 0;
    const raw = Math.exp(lambda * daysOpen);
    return Math.min(raw, policy.cap);
  }
  return 1;
}

function computeUnknownRisk(input: UnknownMetricsInput): UnknownMetricsOutput {
  const { baseUnkScore: B, envAmplifier: A, trust: T, daysOpen, decayPolicy } = input;

  const D_t = computeDecayMultiplier(daysOpen, decayPolicy);
  const raw = (B * (1 + A)) * D_t * (1 - T);

  const unknownRisk = Math.max(0, Math.min(raw, 1)); // clamp 0..1

  return { decayMultiplier: D_t, unknownRisk };
}

3.3 Heuristics for B, A, T

Implement these as pure functions with configurationdriven weights:

  • B (base unknown score):

    • Start from prior: by classification_type (binary > library > config).

    • Adjust up for:

      • Stripped binary (no symbols, high entropy)
      • Suspicious segments (executable stack/heap)
      • Known packer signatures (UPX, etc.)
    • Adjust down for:

      • Large, wellknown dependency path (/usr/lib/...)
      • Known safe signatures (if partially known).
  • A (environment amplifier):

    • +0.2 if artifact is part of container entrypoint (PID 1).
    • +0.1 if file is in a PATH dir (e.g., /usr/local/bin).
    • +0.1 if the runtime has network capabilities/capabilities flags.
    • Cap at 0.5 for v1.
  • T (trust):

    • Start at 0.5.
    • +0.3 if registry/signature/attestation chain verified.
    • +0.1 if source registry is “trusted vendor list”.
    • 0.3 if checksum mismatch or feed conflict.
    • Clamp 0..1.

Store the raw factors (B, A, T) on the artifact for transparency and later replays.


4. Scanner integration

4.1 Emission format (from scanner → backend)

Define a minimal ingestion contract (JSON over HTTP or a message):

{
  "scanId": "urn:scan:1234",
  "imageDigest": "sha256:abc123...",
  "observedAt": "2025-11-27T12:34:56Z",
  "unknowns": [
    {
      "externalId": "scanner-unique-id-1",
      "originSource": "scanner",
      "originFeed": "binary-scanner@1.4.3",
      "filePath": "/usr/local/bin/stripped",
      "buildId": null,
      "purl": null,
      "hashSha256": "aa...",
      "cpe": null,
      "classificationType": "binary",
      "classificationReason": "stripped_binary",
      "rawSignals": {
        "entropy": 7.4,
        "hasSymbols": false,
        "isEntrypoint": true,
        "inPathDir": true
      }
    }
  ]
}

The backend maps rawSignalsB, A, T.

4.2 Idempotency

  • Define uniqueness key on (image_digest, file_path, hash_sha256) for v1.

  • On ingest:

    • If an artifact exists:

      • Update last_observed_at.
      • Recompute age (now - first_observed_at) and UR.
      • Add reobserved event.
    • If not:

      • Insert new row with first_observed_at = observedAt.

4.3 HTTP endpoint

POST /internal/unknowns/ingest

  • Auth: internal service token.
  • Returns perunknown mapping to internal id and computed UR.

Error handling:

  • If invalid payload → 400 with list of errors.
  • Partial failure: process valid unknowns, return failedUnknowns array with reasons.

5. Backend API for UI & CI

5.1 List unknowns

GET /unknowns

Query params:

  • imageDigest (optional)
  • status (optional multi: unresolved, suppressed, etc.)
  • minUr, maxUr (optional)
  • maxAgeDays (optional)
  • page, pageSize

Response:

{
  "items": [
    {
      "id": "urn:stella:unknowns:uuid",
      "imageDigest": "sha256:...",
      "filePath": "/usr/local/bin/stripped",
      "classificationType": "binary",
      "classificationReason": "stripped_binary",
      "status": "unresolved",
      "firstObservedAt": "...",
      "lastObservedAt": "...",
      "ageDays": 17,
      "baseUnkScore": 0.7,
      "envAmplifier": 0.2,
      "trust": 0.1,
      "decayPolicyId": "decay:default:v1",
      "decayMultiplier": 1.17,
      "currentUr": 0.84,
      "currentConfidence": 0.8
    }
  ],
  "total": 123
}

5.2 Get single unknown + event history

GET /unknowns/{id}

Include:

  • The artifact.
  • Latest metrics.
  • Recent events (with pagination).

5.3 Update status / suppression

PATCH /unknowns/{id}

Body options:

{
  "status": "suppressed",
  "notes": "Reviewed; internal diagnostics binary.",
  "suppression": {
    "expiresAt": "2025-12-31T00:00:00Z"
  }
}

Backend:

  • Validates transition (cannot unsuppress to “unresolved” without event).
  • Writes to unknown_suppressions.
  • Writes status_changed + suppression_applied events.

5.4 Image rollups

GET /images/{imageDigest}/unknowns/summary

Response:

{
  "imageDigest": "sha256:...",
  "computedAt": "...",
  "unknownCountTotal": 40,
  "unknownCountUnresolved": 30,
  "unknownCountHighUr": 4,
  "p50Ur": 0.35,
  "p90Ur": 0.82,
  "topNUrSum": 2.4,
  "medianAgeDays": 9
}

This is what CI and UI will mostly query.


6. Trustdecay job & rollup computation

6.1 Periodic recompute job

Schedule (e.g., every hour):

  1. Fetch unknown_artifacts where:

    • status IN ('unresolved', 'suppressed', 'mitigated')
    • last_observed_at >= now() - interval '90 days' (tunable)
  2. Compute daysOpen = now() - first_observed_at.

  3. Compute D_t and UR_t with scoring library.

  4. Update unknown_artifacts.current_ur, current_decay_multiplier.

  5. Append metrics_recomputed event (batch size threshold, e.g., only when UR changed > 0.01).

6.2 Rollup job

Every X minutes:

  1. For each image_digest with active unknowns:

    • Compute:

      • unknown_count_total
      • unknown_count_unresolved (status = unresolved)
      • unknown_count_high_ur (UR ≥ threshold)
      • p50 / p90 UR (use DB percentile or compute in app)
      • top_n_ur_sum (sum of top 5 UR)
      • median_age_days
  2. Upsert into unknown_image_rollups.


7. CI / promotion gating

Expose a simple policy evaluation API for CI and deploy pipelines.

7.1 Policy definition (config)

Example YAML:

unknownsPolicy:
  blockIf:
    - kind: "anyUrAboveThreshold"
      threshold: 0.8
    - kind: "countAboveAge"
      maxCount: 5
      ageDays: 14
  warnIf:
    - kind: "unknownCountAbove"
      maxCount: 50

7.2 Policy evaluation endpoint

GET /policy/unknowns/evaluate?imageDigest=sha256:...

Response:

{
  "imageDigest": "sha256:...",
  "result": "block", // "ok" | "warn" | "block"
  "reasons": [
    {
      "kind": "anyUrAboveThreshold",
      "detail": "1 unknown with UR>=0.8 (max allowed: 0)"
    }
  ],
  "summary": {
    "unknownCountUnresolved": 30,
    "p90Ur": 0.82,
    "medianAgeDays": 17
  }
}

CI can decide to fail build/deploy based on result.


8. UI implementation (Trust Algebra Studio)

8.1 Image detail page: “Unknowns” tab

Components:

  1. Header metrics ribbon

    • Unknowns unresolved, p90 UR, median age, weekly trend sparkline.
    • Fetch from /images/{digest}/unknowns/summary.
  2. Unknowns table

    • Columns:

      • Status pill
      • UR (with color + tooltip showing B, A, T, D_t)
      • Classification type/reason
      • File path
      • Age
      • Last observed
    • Filters:

      • Status, UR range, age range, reason, type.
  3. Row drawer / detail panel

    • Show:

      • All core fields.

      • Evidence:

        • origin (scanner, feed, runtime)
        • raw signals (entropy, sections, etc)
        • SBOM component link (if any)
      • Timeline (events list)

    • Actions:

      • Change status (unresolved → suppressed/mitigated/confirmed).
      • Add note.
      • Set/extend suppression expiry.

8.2 Global “Unknowns board”

Goals:

  • Portfolio view; triage across many images.

Features:

  • Filters by:

    • Team/application/service
    • Time range for first observed
    • UR bucket (00.3, 0.30.6, 0.61)
  • Cards/rows per image:

    • Unknown counts, p90 UR, median age.
    • Trend of unknown count (last N weeks).
  • Click through to imagedetail tab.

8.3 “Whatif” slider (optional v1.1)

On an image or org-level:

  • Slider(s) to visualize effect of:

    • k / lambda change (decay speed).
    • Trust baseline changes (simulate better attestations).
  • Implement by calling a stateless endpoint:

    • POST /unknowns/what-if with:

      • Current unknowns list IDs
      • Proposed decay policy
    • Returns recalculated URs and hypothetical gate result (but does not persist).


9. Observability & analytics

9.1 Metrics

Emit structured events/metrics (OpenTelemetry, etc.):

  • Counters:

    • unknowns_ingested_total (labels: source, classification_type, reason)
    • unknowns_resolved_total (labels: status)
  • Gauges:

    • unknowns_unresolved_count per image/service.
    • unknowns_p90_ur per image/service.
    • unknowns_median_age_days.

9.2 Weekly report generator

Batch job:

  1. Compute, per org or team:

    • Total unknowns.

    • New unknowns this week.

    • Resolved unknowns this week.

    • Median age.

    • Top 10 images by:

      • Highest p90 UR.
      • Largest number of longlived unknowns (> X days).
  2. Persist into analytics store (ClickHouse) + push into:

    • Slack channel / email with a short plaintext summary and link to UI.

10. Security & compliance

  • Ensure all APIs require authentication & proper scopes:

    • Scanner ingest: internal service token only.
    • UI APIs: user identity + RBAC (e.g., team can only see their images).
  • Audit log:

    • unknown_artifact_events must be immutable and queryable by compliance teams.
  • PII:

    • Avoid storing user PII in notes; if necessary, apply redaction.

11. Suggested delivery plan (sprints/epics)

Sprint 1 Foundations & ingest path

  • DB migrations: unknown_artifacts, unknown_artifact_events, decay_policies.
  • Implement scoring library (B, A, T, UR_t, D_t).
  • Implement /internal/unknowns/ingest endpoint with idempotency.
  • Extend scanner to emit unknowns and integrate with ingest.
  • Basic GET /unknowns?imageDigest=... API.
  • Seed decay:default:v1 policy.

Exit criteria: Unknowns created and UR computed from real scans; queryable via API.


Sprint 2 Decay, rollups, and CI hook

  • Implement periodic job to recompute decay & UR.
  • Implement rollup job + unknown_image_rollups table.
  • Implement GET /images/{digest}/unknowns/summary.
  • Implement policy evaluation endpoint for CI.
  • Wire CI to block/warn based on policy.

Exit criteria: CI gate can fail a build due to highrisk unknowns; rollups visible via API.


Sprint 3 UI (Unknowns tab + board)

  • Image detail “Unknowns” tab:

    • Metrics ribbon, table, filters.
    • Row drawer with evidence & history.
  • Global “Unknowns board” page.

  • Integrate with APIs.

  • Add basic “explainability tooltip” for UR.

Exit criteria: Security team can triage unknowns via UI; product teams can see their exposure.


Sprint 4 Suppression workflow & reporting

  • Implement PATCH /unknowns/{id} + suppression rules & expiries.
  • Extend periodic jobs to autoexpire suppressions.
  • Weekly unknowns report job → analytics + Slack/email.
  • Add “trend” sparklines and unknowns burndown in UI.

Exit criteria: Unknowns can be suppressed with justification; org gets weekly burndown trends.


If youd like, I can next:

  • Turn this into concrete tickets (Jira-style) with story points and acceptance criteria, or
  • Generate example migration scripts (SQL) and API contract files (OpenAPI snippet) that your devs can copypaste.