Here’s a crisp, ready‑to‑ship concept you can drop into Stella Ops: an **Unknowns Registry** that captures ambiguous scanner artifacts (stripped binaries, unverifiable packages, orphaned PURLs, missing digests) and treats them as first‑class citizens with probabilistic severity and trust‑decay—so you stay transparent without blocking delivery. ### What this solves (in plain terms) * **No silent drops:** every “can’t verify / can’t resolve” is tracked, not discarded. * **Quantified risk:** unknowns still roll into a portfolio‑level risk number with confidence intervals. * **Trust over time:** stale unknowns get *riskier* the longer they remain unresolved. * **Client confidence:** visibility + trajectory (are unknowns shrinking?) becomes a maturity signal. ### Core data model (CycloneDX/SPDX compatible, attaches to your SBOM spine) ```yaml UnknownArtifact: id: urn:stella:unknowns: observedAt: origin: source: scanner|ingest|runtime feed: evidence: [ filePath, containerDigest, buildId, sectionHints ] identifiers: purl?: # orphan/incomplete PURL allowed hash?: # missing digest allowed cpe?: classification: type: binary|library|package|script|config|other reason: stripped_binary|missing_signature|no_feed_match|ambiguous_name|checksum_mismatch|other metrics: baseUnkScore: 0..1 confidence: 0..1 # model confidence in the *score* trust: 0..1 # provenance trust (sig/attest, feed quality) decayPolicyId: resolution: status: unresolved|suppressed|mitigated|confirmed-benign|confirmed-risk updatedAt: notes: links: scanId: componentId?: attestations?: [ dsse, in-toto, rekorRef ] ``` ### Scoring (simple, explainable, deterministic) * **Unknown Risk (UR):** `UR_t = clamp( (B * (1 + A)) * D_t * (1 - T) , 0, 1 )` * `B` = `baseUnkScore` (heuristics: file entropy, section hints, ELF flags, import tables, size, location) * `A` = **Environment Amplifier** (runtime proximity: container entrypoint? PID namespace? network caps?) * `T` = **Trust** (sig/attest/registry reputation/feed pedigree normalized to 0..1) * `D_t` = **Trust‑decay multiplier** over time `t`: * Linear: `D_t = 1 + k * daysOpen` (e.g., `k = 0.01`) * or Exponential: `D_t = e^(λ * daysOpen)` (e.g., `λ = 0.005`) * **Portfolio roll‑up:** use **P90 of UR_t** across images + **sum of top‑N UR_t** to avoid dilution. ### Policies & SLOs * **SLO:** *Unknowns burn‑down* ≤ X% week‑over‑week; *Median age* ≤ Y days. * **Gates:** block promotion when (a) any `UR_t ≥ 0.8`, or (b) more than `M` unknowns with age > `Z` days. * **Suppressions:** require justification + expiry; suppression reduces `A` but does **not** zero `D_t`. ### Trust‑decay policies (pluggable) ```yaml DecayPolicy: id: decay:default:v1 kind: linear|exponential|custom params: k: 0.01 # linear slope per day cap: 2.0 # max multiplier ``` ### Scanner hooks (where to emit Unknowns) * **Binary scan:** stripped ELF/Mach‑O/PE; missing build‑ID; abnormal sections; impossible symbol map. * **Package map:** PURL inferred from path without registry proof; mismatched checksum; vendor fork detected. * **Attestation:** DSSE missing / invalid; Sigstore chain unverifiable; Rekor entry not found. * **Feeds:** component seen in runtime but absent from SBOM (or vice versa). ### Deterministic generation (for replay/audits) * Include **Unknowns** in the **Scan Manifest** (your deterministic bundle): inputs, ruleset hash, feed hashes, lattice policy version, and the exact classifier thresholds that produced `B`, `A`, `T`. That lets you replay and reproduce UR_t byte‑for‑byte during audits. ### API surface (StellaOps.Authority) ``` POST /unknowns/ingest # bulk ingest from Scanner/Vexer GET /unknowns?imageDigest=… # list + filters (status, age, UR buckets) PATCH /unknowns/{id}/resolve # set status, add evidence, set suppression (with expiry) GET /unknowns/stats # burn-downs, age histograms, P90 UR_t, top-N contributors ``` ### UI slices (Trust Algebra Studio) * **Risk ribbon:** Unknowns count, P90 UR_t, median age, trend sparkline. * **Aging board:** columns by age buckets; cards show reason, UR_t, `T`, decay policy, evidence. * **What‑if slider:** adjust `k`/`λ` and see retroactive effect on release readiness (deterministic preview). * **Explainability panel:** show `B`, `A`, `T`, `D_t` factors with succinct evidence (e.g., “ELF stripped; no .symtab; no Sigstore; runtime hits PID 1 → A=0.2; trust=0.1; day 17 → D=1.17”). ### Heuristics for `baseUnkScore (B)` (portable across ELF/PE/Mach‑O) * Section/segment anomalies; entropy outliers; import tables linking to risky APIs; executable heap/stack flags. * Location & role (PATH proximity to entrypoint, init scripts). * Size/type priors (tiny droppers vs fat libs). * Known‑unknown patterns (packer signatures, `UPX` traces without unpack attest). ### Governance & reporting * **Weekly Unknowns Report:** totals, burn‑down, median age, “top unresolved origins” (by registry / base image), and “time‑to‑first‑evidence.” * **Vendor scorecards:** if a supplier causes recurring unknowns, reflect it in **Adaptive Trust Economics**. ### Minimal implementation slice (2 sprints) 1. **Sprint A (backend + ingest):** model + tables, ingest endpoint, deterministic manifest hook, basic scoring (linear decay), list/query API, roll‑up metrics. 2. **Sprint B (UI + policy):** Unknowns board, what‑if slider, suppress/resolve workflow w/ expiry, weekly report, policy gates in CI. --- If you want, I can generate: * the exact **PostgreSQL schema + indices** (including age/UR materialized views), * **.NET 10** DTOs, handlers, and a **deterministic scoring library** with unit tests, * a **Grafana/ClickHouse** dashboard spec for burn‑down and P90 tracking, * and a one‑page client‑facing explainer your sales team can use. Perfect, let’s turn the concept into a concrete, dev‑ready implementation plan. I’ll assume: * You already have: * A scanner pipeline (binaries, SBOMs, attestations) * A backend service (StellaOps.Authority) * A UI (Trust Algebra Studio) * Observability (OpenTelemetry, ClickHouse/Presto) You can adapt naming and tech stack as needed. --- ## 0. Scope & success criteria **Goals** 1. Persist all “unknown-ish” scanner findings (stripped binaries, unverifiable PURLs, missing digests, etc.) as first‑class entities. 2. Compute a deterministic **Unknown Risk (UR)** per artifact and roll it up per image/application. 3. Apply **trust‑decay** over time and expose burn‑down metrics. 4. Provide UI workflows to triage, suppress, and resolve unknowns. 5. Enforce release gates based on unknown risk and age. **Non‑goals (for v1)** * No full ML; use deterministic heuristics + tunable weights. * No cross‑org multi‑tenant policy — single org/single policy set. * No per‑developer responsibility/assignment yet (can add later). --- ## 1. Architecture & components ### 1.1 New/updated components 1. **Unknowns Registry (backend submodule)** * Lives in your existing backend (e.g., `StellaOps.Authority.Unknowns`). * Owns DB schema, scoring logic, and API. 2. **Scanner integration** * Extend `StellaOps.Scanner` (and/or `Vexer`) to emit “unknown” findings into the registry via HTTP or message bus. 3. **UI: Unknowns in Trust Algebra Studio** * New section/tab: “Unknowns” under each image/app. * Global “Unknowns board” for portfolio view. 4. **Analytics & jobs** * Periodic job to recompute trust‑decay & UR. * Weekly report generator (e.g., pushing into ClickHouse, Slack, or email). --- ## 2. Data model (DB schema) Use relational DB; here’s a concrete schema you can translate into migrations. ### 2.1 Tables #### `unknown_artifacts` Represents the current state of each unknown. * `id` (UUID, PK) * `created_at` (timestamp) * `updated_at` (timestamp) * `first_observed_at` (timestamp, NOT NULL) * `last_observed_at` (timestamp, NOT NULL) * `origin_source` (enum: `scanner`, `runtime`, `ingest`) * `origin_feed` (text) – e.g., `binary-scanner@1.4.3` * `origin_scan_id` (UUID / text) – foreign key to `scan_runs` if you have it * `image_digest` (text, indexed) – to tie to container/image * `component_id` (UUID, nullable) – SBOM component when later mapped * `file_path` (text, nullable) * `build_id` (text, nullable) – ELF/Mach-O/PE build ID if any * `purl` (text, nullable) * `hash_sha256` (text, nullable) * `cpe` (text, nullable) * `classification_type` (enum: `binary`, `library`, `package`, `script`, `config`, `other`) * `classification_reason` (enum: `stripped_binary`, `missing_signature`, `no_feed_match`, `ambiguous_name`, `checksum_mismatch`, `other`) * `status` (enum: `unresolved`, `suppressed`, `mitigated`, `confirmed_benign`, `confirmed_risk`) * `status_changed_at` (timestamp) * `status_changed_by` (text / user-id) * `notes` (text) * `decay_policy_id` (FK → `decay_policies`) * `base_unk_score` (double, 0..1) * `env_amplifier` (double, 0..1) * `trust` (double, 0..1) * `current_decay_multiplier` (double) * `current_ur` (double, 0..1) – Unknown Risk at last recompute * `current_confidence` (double, 0..1) – confidence in `current_ur` * `is_deleted` (bool) – soft delete **Indexes** * `idx_unknown_artifacts_image_digest_status` * `idx_unknown_artifacts_status_created_at` * `idx_unknown_artifacts_current_ur` * `idx_unknown_artifacts_last_observed_at` #### `unknown_artifact_events` Append-only event log for auditable changes. * `id` (UUID, PK) * `unknown_artifact_id` (FK → `unknown_artifacts`) * `created_at` (timestamp) * `actor` (text / user-id / system) * `event_type` (enum: `created`, `reobserved`, `status_changed`, `note_added`, `metrics_recomputed`, `linked_component`, `suppression_applied`, `suppression_expired`) * `payload` (JSONB) – diff or event‑specific details Index: `idx_unknown_artifact_events_artifact_id_created_at` #### `decay_policies` Defines how trust‑decay works. * `id` (text, PK) – e.g., `decay:default:v1` * `kind` (enum: `linear`, `exponential`) * `param_k` (double, nullable) – for linear: slope * `param_lambda` (double, nullable) – for exponential * `cap` (double, default 2.0) * `description` (text) * `is_default` (bool) #### `unknown_suppressions` Optional; can also reuse `unknown_artifacts.status` but separate table lets you have multiple suppressions over time. * `id` (UUID, PK) * `unknown_artifact_id` (FK) * `created_at` (timestamp) * `created_by` (text) * `reason` (text) * `expires_at` (timestamp, nullable) * `active` (bool) Index: `idx_unknown_suppressions_artifact_active_expires_at` #### `unknown_image_rollups` Precomputed rollups per image (for fast dashboards/gates). * `id` (UUID, PK) * `image_digest` (text, indexed) * `computed_at` (timestamp) * `unknown_count_total` (int) * `unknown_count_unresolved` (int) * `unknown_count_high_ur` (int) – e.g., UR ≥ 0.8 * `p50_ur` (double) * `p90_ur` (double) * `top_n_ur_sum` (double) * `median_age_days` (double) --- ## 3. Scoring engine implementation Create a small, deterministic scoring library so the same code can be used in: * Backend ingest path (for immediate UR) * Batch recompute job * “What‑if” UI simulations (optionally via stateless API) ### 3.1 Data types Define a core model, e.g.: ```ts type UnknownMetricsInput = { baseUnkScore: number; // B envAmplifier: number; // A trust: number; // T daysOpen: number; // t decayPolicy: { kind: "linear" | "exponential"; k?: number; lambda?: number; cap: number; }; }; type UnknownMetricsOutput = { decayMultiplier: number; // D_t unknownRisk: number; // UR_t }; ``` ### 3.2 Algorithm ```ts function computeDecayMultiplier( daysOpen: number, policy: DecayPolicy ): number { if (policy.kind === "linear") { const raw = 1 + (policy.k ?? 0) * daysOpen; return Math.min(raw, policy.cap); } if (policy.kind === "exponential") { const lambda = policy.lambda ?? 0; const raw = Math.exp(lambda * daysOpen); return Math.min(raw, policy.cap); } return 1; } function computeUnknownRisk(input: UnknownMetricsInput): UnknownMetricsOutput { const { baseUnkScore: B, envAmplifier: A, trust: T, daysOpen, decayPolicy } = input; const D_t = computeDecayMultiplier(daysOpen, decayPolicy); const raw = (B * (1 + A)) * D_t * (1 - T); const unknownRisk = Math.max(0, Math.min(raw, 1)); // clamp 0..1 return { decayMultiplier: D_t, unknownRisk }; } ``` ### 3.3 Heuristics for `B`, `A`, `T` Implement these as pure functions with configuration‑driven weights: * `B` (base unknown score): * Start from prior: by `classification_type` (binary > library > config). * Adjust up for: * Stripped binary (no symbols, high entropy) * Suspicious segments (executable stack/heap) * Known packer signatures (UPX, etc.) * Adjust down for: * Large, well‑known dependency path (`/usr/lib/...`) * Known safe signatures (if partially known). * `A` (environment amplifier): * +0.2 if artifact is part of container entrypoint (PID 1). * +0.1 if file is in a PATH dir (e.g., `/usr/local/bin`). * +0.1 if the runtime has network capabilities/capabilities flags. * Cap at 0.5 for v1. * `T` (trust): * Start at 0.5. * +0.3 if registry/signature/attestation chain verified. * +0.1 if source registry is “trusted vendor list”. * −0.3 if checksum mismatch or feed conflict. * Clamp 0..1. Store the raw factors (`B`, `A`, `T`) on the artifact for transparency and later replays. --- ## 4. Scanner integration ### 4.1 Emission format (from scanner → backend) Define a minimal ingestion contract (JSON over HTTP or a message): ```jsonc { "scanId": "urn:scan:1234", "imageDigest": "sha256:abc123...", "observedAt": "2025-11-27T12:34:56Z", "unknowns": [ { "externalId": "scanner-unique-id-1", "originSource": "scanner", "originFeed": "binary-scanner@1.4.3", "filePath": "/usr/local/bin/stripped", "buildId": null, "purl": null, "hashSha256": "aa...", "cpe": null, "classificationType": "binary", "classificationReason": "stripped_binary", "rawSignals": { "entropy": 7.4, "hasSymbols": false, "isEntrypoint": true, "inPathDir": true } } ] } ``` The backend maps `rawSignals` → `B`, `A`, `T`. ### 4.2 Idempotency * Define uniqueness key on `(image_digest, file_path, hash_sha256)` for v1. * On ingest: * If an artifact exists: * Update `last_observed_at`. * Recompute age (`now - first_observed_at`) and UR. * Add `reobserved` event. * If not: * Insert new row with `first_observed_at = observedAt`. ### 4.3 HTTP endpoint `POST /internal/unknowns/ingest` * Auth: internal service token. * Returns per‑unknown mapping to internal `id` and computed UR. Error handling: * If invalid payload → 400 with list of errors. * Partial failure: process valid unknowns, return `failedUnknowns` array with reasons. --- ## 5. Backend API for UI & CI ### 5.1 List unknowns `GET /unknowns` Query params: * `imageDigest` (optional) * `status` (optional multi: unresolved, suppressed, etc.) * `minUr`, `maxUr` (optional) * `maxAgeDays` (optional) * `page`, `pageSize` Response: ```jsonc { "items": [ { "id": "urn:stella:unknowns:uuid", "imageDigest": "sha256:...", "filePath": "/usr/local/bin/stripped", "classificationType": "binary", "classificationReason": "stripped_binary", "status": "unresolved", "firstObservedAt": "...", "lastObservedAt": "...", "ageDays": 17, "baseUnkScore": 0.7, "envAmplifier": 0.2, "trust": 0.1, "decayPolicyId": "decay:default:v1", "decayMultiplier": 1.17, "currentUr": 0.84, "currentConfidence": 0.8 } ], "total": 123 } ``` ### 5.2 Get single unknown + event history `GET /unknowns/{id}` Include: * The artifact. * Latest metrics. * Recent events (with pagination). ### 5.3 Update status / suppression `PATCH /unknowns/{id}` Body options: ```jsonc { "status": "suppressed", "notes": "Reviewed; internal diagnostics binary.", "suppression": { "expiresAt": "2025-12-31T00:00:00Z" } } ``` Backend: * Validates transition (cannot un‑suppress to “unresolved” without event). * Writes to `unknown_suppressions`. * Writes `status_changed` + `suppression_applied` events. ### 5.4 Image rollups `GET /images/{imageDigest}/unknowns/summary` Response: ```jsonc { "imageDigest": "sha256:...", "computedAt": "...", "unknownCountTotal": 40, "unknownCountUnresolved": 30, "unknownCountHighUr": 4, "p50Ur": 0.35, "p90Ur": 0.82, "topNUrSum": 2.4, "medianAgeDays": 9 } ``` This is what CI and UI will mostly query. --- ## 6. Trust‑decay job & rollup computation ### 6.1 Periodic recompute job Schedule (e.g., every hour): 1. Fetch `unknown_artifacts` where: * `status IN ('unresolved', 'suppressed', 'mitigated')` * `last_observed_at >= now() - interval '90 days'` (tunable) 2. Compute `daysOpen = now() - first_observed_at`. 3. Compute `D_t` and `UR_t` with scoring library. 4. Update `unknown_artifacts.current_ur`, `current_decay_multiplier`. 5. Append `metrics_recomputed` event (batch size threshold, e.g., only when UR changed > 0.01). ### 6.2 Rollup job Every X minutes: 1. For each `image_digest` with active unknowns: * Compute: * `unknown_count_total` * `unknown_count_unresolved` (`status = unresolved`) * `unknown_count_high_ur` (UR ≥ threshold) * `p50` / `p90` UR (use DB percentile or compute in app) * `top_n_ur_sum` (sum of top 5 UR) * `median_age_days` 2. Upsert into `unknown_image_rollups`. --- ## 7. CI / promotion gating Expose a simple policy evaluation API for CI and deploy pipelines. ### 7.1 Policy definition (config) Example YAML: ```yaml unknownsPolicy: blockIf: - kind: "anyUrAboveThreshold" threshold: 0.8 - kind: "countAboveAge" maxCount: 5 ageDays: 14 warnIf: - kind: "unknownCountAbove" maxCount: 50 ``` ### 7.2 Policy evaluation endpoint `GET /policy/unknowns/evaluate?imageDigest=sha256:...` Response: ```jsonc { "imageDigest": "sha256:...", "result": "block", // "ok" | "warn" | "block" "reasons": [ { "kind": "anyUrAboveThreshold", "detail": "1 unknown with UR>=0.8 (max allowed: 0)" } ], "summary": { "unknownCountUnresolved": 30, "p90Ur": 0.82, "medianAgeDays": 17 } } ``` CI can decide to fail build/deploy based on `result`. --- ## 8. UI implementation (Trust Algebra Studio) ### 8.1 Image detail page: “Unknowns” tab Components: 1. **Header metrics ribbon** * Unknowns unresolved, p90 UR, median age, weekly trend sparkline. * Fetch from `/images/{digest}/unknowns/summary`. 2. **Unknowns table** * Columns: * Status pill * UR (with color + tooltip showing `B`, `A`, `T`, `D_t`) * Classification type/reason * File path * Age * Last observed * Filters: * Status, UR range, age range, reason, type. 3. **Row drawer / detail panel** * Show: * All core fields. * Evidence: * origin (scanner, feed, runtime) * raw signals (entropy, sections, etc) * SBOM component link (if any) * Timeline (events list) * Actions: * Change status (unresolved → suppressed/mitigated/confirmed). * Add note. * Set/extend suppression expiry. ### 8.2 Global “Unknowns board” Goals: * Portfolio view; triage across many images. Features: * Filters by: * Team/application/service * Time range for first observed * UR bucket (0–0.3, 0.3–0.6, 0.6–1) * Cards/rows per image: * Unknown counts, p90 UR, median age. * Trend of unknown count (last N weeks). * Click through to image‑detail tab. ### 8.3 “What‑if” slider (optional v1.1) On an image or org-level: * Slider(s) to visualize effect of: * `k` / `lambda` change (decay speed). * Trust baseline changes (simulate better attestations). * Implement by calling a stateless endpoint: * `POST /unknowns/what-if` with: * Current unknowns list IDs * Proposed decay policy * Returns recalculated URs and hypothetical gate result (but does **not** persist). --- ## 9. Observability & analytics ### 9.1 Metrics Emit structured events/metrics (OpenTelemetry, etc.): * Counters: * `unknowns_ingested_total` (labels: `source`, `classification_type`, `reason`) * `unknowns_resolved_total` (labels: `status`) * Gauges: * `unknowns_unresolved_count` per image/service. * `unknowns_p90_ur` per image/service. * `unknowns_median_age_days`. ### 9.2 Weekly report generator Batch job: 1. Compute, per org or team: * Total unknowns. * New unknowns this week. * Resolved unknowns this week. * Median age. * Top 10 images by: * Highest p90 UR. * Largest number of long‑lived unknowns (> X days). 2. Persist into analytics store (ClickHouse) + push into: * Slack channel / email with a short plain‑text summary and link to UI. --- ## 10. Security & compliance * Ensure all APIs require authentication & proper scopes: * Scanner ingest: internal service token only. * UI APIs: user identity + RBAC (e.g., team can only see their images). * Audit log: * `unknown_artifact_events` must be immutable and queryable by compliance teams. * PII: * Avoid storing user PII in notes; if necessary, apply redaction. --- ## 11. Suggested delivery plan (sprints/epics) ### Sprint 1 – Foundations & ingest path * [ ] DB migrations: `unknown_artifacts`, `unknown_artifact_events`, `decay_policies`. * [ ] Implement scoring library (`B`, `A`, `T`, `UR_t`, `D_t`). * [ ] Implement `/internal/unknowns/ingest` endpoint with idempotency. * [ ] Extend scanner to emit unknowns and integrate with ingest. * [ ] Basic `GET /unknowns?imageDigest=...` API. * [ ] Seed `decay:default:v1` policy. **Exit criteria:** Unknowns created and UR computed from real scans; queryable via API. --- ### Sprint 2 – Decay, rollups, and CI hook * [ ] Implement periodic job to recompute decay & UR. * [ ] Implement rollup job + `unknown_image_rollups` table. * [ ] Implement `GET /images/{digest}/unknowns/summary`. * [ ] Implement policy evaluation endpoint for CI. * [ ] Wire CI to block/warn based on policy. **Exit criteria:** CI gate can fail a build due to high‑risk unknowns; rollups visible via API. --- ### Sprint 3 – UI (Unknowns tab + board) * [ ] Image detail “Unknowns” tab: * Metrics ribbon, table, filters. * Row drawer with evidence & history. * [ ] Global “Unknowns board” page. * [ ] Integrate with APIs. * [ ] Add basic “explainability tooltip” for UR. **Exit criteria:** Security team can triage unknowns via UI; product teams can see their exposure. --- ### Sprint 4 – Suppression workflow & reporting * [ ] Implement `PATCH /unknowns/{id}` + suppression rules & expiries. * [ ] Extend periodic jobs to auto‑expire suppressions. * [ ] Weekly unknowns report job → analytics + Slack/email. * [ ] Add “trend” sparklines and unknowns burn‑down in UI. **Exit criteria:** Unknowns can be suppressed with justification; org gets weekly burn‑down trends. --- If you’d like, I can next: * Turn this into concrete tickets (Jira-style) with story points and acceptance criteria, or * Generate example migration scripts (SQL) and API contract files (OpenAPI snippet) that your devs can copy‑paste.