23 KiB
Here’s a crisp, ready‑to‑ship concept you can drop into Stella Ops: an Unknowns Registry that captures ambiguous scanner artifacts (stripped binaries, unverifiable packages, orphaned PURLs, missing digests) and treats them as first‑class citizens with probabilistic severity and trust‑decay—so you stay transparent without blocking delivery.
What this solves (in plain terms)
- No silent drops: every “can’t verify / can’t resolve” is tracked, not discarded.
- Quantified risk: unknowns still roll into a portfolio‑level risk number with confidence intervals.
- Trust over time: stale unknowns get riskier the longer they remain unresolved.
- Client confidence: visibility + trajectory (are unknowns shrinking?) becomes a maturity signal.
Core data model (CycloneDX/SPDX compatible, attaches to your SBOM spine)
UnknownArtifact:
id: urn:stella:unknowns:<uuid>
observedAt: <RFC3339>
origin:
source: scanner|ingest|runtime
feed: <name/version>
evidence: [ filePath, containerDigest, buildId, sectionHints ]
identifiers:
purl?: <string> # orphan/incomplete PURL allowed
hash?: <sha256|null> # missing digest allowed
cpe?: <string|null>
classification:
type: binary|library|package|script|config|other
reason: stripped_binary|missing_signature|no_feed_match|ambiguous_name|checksum_mismatch|other
metrics:
baseUnkScore: 0..1
confidence: 0..1 # model confidence in the *score*
trust: 0..1 # provenance trust (sig/attest, feed quality)
decayPolicyId: <ref>
resolution:
status: unresolved|suppressed|mitigated|confirmed-benign|confirmed-risk
updatedAt: <RFC3339>
notes: <text>
links:
scanId: <ref>
componentId?: <ref to SBOM component if later mapped>
attestations?: [ dsse, in-toto, rekorRef ]
Scoring (simple, explainable, deterministic)
-
Unknown Risk (UR):
UR_t = clamp( (B * (1 + A)) * D_t * (1 - T) , 0, 1 )-
B=baseUnkScore(heuristics: file entropy, section hints, ELF flags, import tables, size, location) -
A= Environment Amplifier (runtime proximity: container entrypoint? PID namespace? network caps?) -
T= Trust (sig/attest/registry reputation/feed pedigree normalized to 0..1) -
D_t= Trust‑decay multiplier over timet:- Linear:
D_t = 1 + k * daysOpen(e.g.,k = 0.01) - or Exponential:
D_t = e^(λ * daysOpen)(e.g.,λ = 0.005)
- Linear:
-
-
Portfolio roll‑up: use P90 of UR_t across images + sum of top‑N UR_t to avoid dilution.
Policies & SLOs
- SLO: Unknowns burn‑down ≤ X% week‑over‑week; Median age ≤ Y days.
- Gates: block promotion when (a) any
UR_t ≥ 0.8, or (b) more thanMunknowns with age >Zdays. - Suppressions: require justification + expiry; suppression reduces
Abut does not zeroD_t.
Trust‑decay policies (pluggable)
DecayPolicy:
id: decay:default:v1
kind: linear|exponential|custom
params:
k: 0.01 # linear slope per day
cap: 2.0 # max multiplier
Scanner hooks (where to emit Unknowns)
- Binary scan: stripped ELF/Mach‑O/PE; missing build‑ID; abnormal sections; impossible symbol map.
- Package map: PURL inferred from path without registry proof; mismatched checksum; vendor fork detected.
- Attestation: DSSE missing / invalid; Sigstore chain unverifiable; Rekor entry not found.
- Feeds: component seen in runtime but absent from SBOM (or vice versa).
Deterministic generation (for replay/audits)
- Include Unknowns in the Scan Manifest (your deterministic bundle): inputs, ruleset hash, feed hashes, lattice policy version, and the exact classifier thresholds that produced
B,A,T. That lets you replay and reproduce UR_t byte‑for‑byte during audits.
API surface (StellaOps.Authority)
POST /unknowns/ingest # bulk ingest from Scanner/Vexer
GET /unknowns?imageDigest=… # list + filters (status, age, UR buckets)
PATCH /unknowns/{id}/resolve # set status, add evidence, set suppression (with expiry)
GET /unknowns/stats # burn-downs, age histograms, P90 UR_t, top-N contributors
UI slices (Trust Algebra Studio)
- Risk ribbon: Unknowns count, P90 UR_t, median age, trend sparkline.
- Aging board: columns by age buckets; cards show reason, UR_t,
T, decay policy, evidence. - What‑if slider: adjust
k/λand see retroactive effect on release readiness (deterministic preview). - Explainability panel: show
B,A,T,D_tfactors with succinct evidence (e.g., “ELF stripped; no .symtab; no Sigstore; runtime hits PID 1 → A=0.2; trust=0.1; day 17 → D=1.17”).
Heuristics for baseUnkScore (B) (portable across ELF/PE/Mach‑O)
- Section/segment anomalies; entropy outliers; import tables linking to risky APIs; executable heap/stack flags.
- Location & role (PATH proximity to entrypoint, init scripts).
- Size/type priors (tiny droppers vs fat libs).
- Known‑unknown patterns (packer signatures,
UPXtraces without unpack attest).
Governance & reporting
- Weekly Unknowns Report: totals, burn‑down, median age, “top unresolved origins” (by registry / base image), and “time‑to‑first‑evidence.”
- Vendor scorecards: if a supplier causes recurring unknowns, reflect it in Adaptive Trust Economics.
Minimal implementation slice (2 sprints)
- Sprint A (backend + ingest): model + tables, ingest endpoint, deterministic manifest hook, basic scoring (linear decay), list/query API, roll‑up metrics.
- Sprint B (UI + policy): Unknowns board, what‑if slider, suppress/resolve workflow w/ expiry, weekly report, policy gates in CI.
If you want, I can generate:
- the exact PostgreSQL schema + indices (including age/UR materialized views),
- .NET 10 DTOs, handlers, and a deterministic scoring library with unit tests,
- a Grafana/ClickHouse dashboard spec for burn‑down and P90 tracking,
- and a one‑page client‑facing explainer your sales team can use. Perfect, let’s turn the concept into a concrete, dev‑ready implementation plan.
I’ll assume:
-
You already have:
- A scanner pipeline (binaries, SBOMs, attestations)
- A backend service (StellaOps.Authority)
- A UI (Trust Algebra Studio)
- Observability (OpenTelemetry, ClickHouse/Presto)
You can adapt naming and tech stack as needed.
0. Scope & success criteria
Goals
- Persist all “unknown-ish” scanner findings (stripped binaries, unverifiable PURLs, missing digests, etc.) as first‑class entities.
- Compute a deterministic Unknown Risk (UR) per artifact and roll it up per image/application.
- Apply trust‑decay over time and expose burn‑down metrics.
- Provide UI workflows to triage, suppress, and resolve unknowns.
- Enforce release gates based on unknown risk and age.
Non‑goals (for v1)
- No full ML; use deterministic heuristics + tunable weights.
- No cross‑org multi‑tenant policy — single org/single policy set.
- No per‑developer responsibility/assignment yet (can add later).
1. Architecture & components
1.1 New/updated components
-
Unknowns Registry (backend submodule)
- Lives in your existing backend (e.g.,
StellaOps.Authority.Unknowns). - Owns DB schema, scoring logic, and API.
- Lives in your existing backend (e.g.,
-
Scanner integration
- Extend
StellaOps.Scanner(and/orVexer) to emit “unknown” findings into the registry via HTTP or message bus.
- Extend
-
UI: Unknowns in Trust Algebra Studio
- New section/tab: “Unknowns” under each image/app.
- Global “Unknowns board” for portfolio view.
-
Analytics & jobs
- Periodic job to recompute trust‑decay & UR.
- Weekly report generator (e.g., pushing into ClickHouse, Slack, or email).
2. Data model (DB schema)
Use relational DB; here’s a concrete schema you can translate into migrations.
2.1 Tables
unknown_artifacts
Represents the current state of each unknown.
id(UUID, PK)created_at(timestamp)updated_at(timestamp)first_observed_at(timestamp, NOT NULL)last_observed_at(timestamp, NOT NULL)origin_source(enum:scanner,runtime,ingest)origin_feed(text) – e.g.,binary-scanner@1.4.3origin_scan_id(UUID / text) – foreign key toscan_runsif you have itimage_digest(text, indexed) – to tie to container/imagecomponent_id(UUID, nullable) – SBOM component when later mappedfile_path(text, nullable)build_id(text, nullable) – ELF/Mach-O/PE build ID if anypurl(text, nullable)hash_sha256(text, nullable)cpe(text, nullable)classification_type(enum:binary,library,package,script,config,other)classification_reason(enum:stripped_binary,missing_signature,no_feed_match,ambiguous_name,checksum_mismatch,other)status(enum:unresolved,suppressed,mitigated,confirmed_benign,confirmed_risk)status_changed_at(timestamp)status_changed_by(text / user-id)notes(text)decay_policy_id(FK →decay_policies)base_unk_score(double, 0..1)env_amplifier(double, 0..1)trust(double, 0..1)current_decay_multiplier(double)current_ur(double, 0..1) – Unknown Risk at last recomputecurrent_confidence(double, 0..1) – confidence incurrent_uris_deleted(bool) – soft delete
Indexes
idx_unknown_artifacts_image_digest_statusidx_unknown_artifacts_status_created_atidx_unknown_artifacts_current_uridx_unknown_artifacts_last_observed_at
unknown_artifact_events
Append-only event log for auditable changes.
id(UUID, PK)unknown_artifact_id(FK →unknown_artifacts)created_at(timestamp)actor(text / user-id / system)event_type(enum:created,reobserved,status_changed,note_added,metrics_recomputed,linked_component,suppression_applied,suppression_expired)payload(JSONB) – diff or event‑specific details
Index: idx_unknown_artifact_events_artifact_id_created_at
decay_policies
Defines how trust‑decay works.
id(text, PK) – e.g.,decay:default:v1kind(enum:linear,exponential)param_k(double, nullable) – for linear: slopeparam_lambda(double, nullable) – for exponentialcap(double, default 2.0)description(text)is_default(bool)
unknown_suppressions
Optional; can also reuse unknown_artifacts.status but separate table lets you have multiple suppressions over time.
id(UUID, PK)unknown_artifact_id(FK)created_at(timestamp)created_by(text)reason(text)expires_at(timestamp, nullable)active(bool)
Index: idx_unknown_suppressions_artifact_active_expires_at
unknown_image_rollups
Precomputed rollups per image (for fast dashboards/gates).
id(UUID, PK)image_digest(text, indexed)computed_at(timestamp)unknown_count_total(int)unknown_count_unresolved(int)unknown_count_high_ur(int) – e.g., UR ≥ 0.8p50_ur(double)p90_ur(double)top_n_ur_sum(double)median_age_days(double)
3. Scoring engine implementation
Create a small, deterministic scoring library so the same code can be used in:
- Backend ingest path (for immediate UR)
- Batch recompute job
- “What‑if” UI simulations (optionally via stateless API)
3.1 Data types
Define a core model, e.g.:
type UnknownMetricsInput = {
baseUnkScore: number; // B
envAmplifier: number; // A
trust: number; // T
daysOpen: number; // t
decayPolicy: {
kind: "linear" | "exponential";
k?: number;
lambda?: number;
cap: number;
};
};
type UnknownMetricsOutput = {
decayMultiplier: number; // D_t
unknownRisk: number; // UR_t
};
3.2 Algorithm
function computeDecayMultiplier(
daysOpen: number,
policy: DecayPolicy
): number {
if (policy.kind === "linear") {
const raw = 1 + (policy.k ?? 0) * daysOpen;
return Math.min(raw, policy.cap);
}
if (policy.kind === "exponential") {
const lambda = policy.lambda ?? 0;
const raw = Math.exp(lambda * daysOpen);
return Math.min(raw, policy.cap);
}
return 1;
}
function computeUnknownRisk(input: UnknownMetricsInput): UnknownMetricsOutput {
const { baseUnkScore: B, envAmplifier: A, trust: T, daysOpen, decayPolicy } = input;
const D_t = computeDecayMultiplier(daysOpen, decayPolicy);
const raw = (B * (1 + A)) * D_t * (1 - T);
const unknownRisk = Math.max(0, Math.min(raw, 1)); // clamp 0..1
return { decayMultiplier: D_t, unknownRisk };
}
3.3 Heuristics for B, A, T
Implement these as pure functions with configuration‑driven weights:
-
B(base unknown score):-
Start from prior: by
classification_type(binary > library > config). -
Adjust up for:
- Stripped binary (no symbols, high entropy)
- Suspicious segments (executable stack/heap)
- Known packer signatures (UPX, etc.)
-
Adjust down for:
- Large, well‑known dependency path (
/usr/lib/...) - Known safe signatures (if partially known).
- Large, well‑known dependency path (
-
-
A(environment amplifier):- +0.2 if artifact is part of container entrypoint (PID 1).
- +0.1 if file is in a PATH dir (e.g.,
/usr/local/bin). - +0.1 if the runtime has network capabilities/capabilities flags.
- Cap at 0.5 for v1.
-
T(trust):- Start at 0.5.
- +0.3 if registry/signature/attestation chain verified.
- +0.1 if source registry is “trusted vendor list”.
- −0.3 if checksum mismatch or feed conflict.
- Clamp 0..1.
Store the raw factors (B, A, T) on the artifact for transparency and later replays.
4. Scanner integration
4.1 Emission format (from scanner → backend)
Define a minimal ingestion contract (JSON over HTTP or a message):
{
"scanId": "urn:scan:1234",
"imageDigest": "sha256:abc123...",
"observedAt": "2025-11-27T12:34:56Z",
"unknowns": [
{
"externalId": "scanner-unique-id-1",
"originSource": "scanner",
"originFeed": "binary-scanner@1.4.3",
"filePath": "/usr/local/bin/stripped",
"buildId": null,
"purl": null,
"hashSha256": "aa...",
"cpe": null,
"classificationType": "binary",
"classificationReason": "stripped_binary",
"rawSignals": {
"entropy": 7.4,
"hasSymbols": false,
"isEntrypoint": true,
"inPathDir": true
}
}
]
}
The backend maps rawSignals → B, A, T.
4.2 Idempotency
-
Define uniqueness key on
(image_digest, file_path, hash_sha256)for v1. -
On ingest:
-
If an artifact exists:
- Update
last_observed_at. - Recompute age (
now - first_observed_at) and UR. - Add
reobservedevent.
- Update
-
If not:
- Insert new row with
first_observed_at = observedAt.
- Insert new row with
-
4.3 HTTP endpoint
POST /internal/unknowns/ingest
- Auth: internal service token.
- Returns per‑unknown mapping to internal
idand computed UR.
Error handling:
- If invalid payload → 400 with list of errors.
- Partial failure: process valid unknowns, return
failedUnknownsarray with reasons.
5. Backend API for UI & CI
5.1 List unknowns
GET /unknowns
Query params:
imageDigest(optional)status(optional multi: unresolved, suppressed, etc.)minUr,maxUr(optional)maxAgeDays(optional)page,pageSize
Response:
{
"items": [
{
"id": "urn:stella:unknowns:uuid",
"imageDigest": "sha256:...",
"filePath": "/usr/local/bin/stripped",
"classificationType": "binary",
"classificationReason": "stripped_binary",
"status": "unresolved",
"firstObservedAt": "...",
"lastObservedAt": "...",
"ageDays": 17,
"baseUnkScore": 0.7,
"envAmplifier": 0.2,
"trust": 0.1,
"decayPolicyId": "decay:default:v1",
"decayMultiplier": 1.17,
"currentUr": 0.84,
"currentConfidence": 0.8
}
],
"total": 123
}
5.2 Get single unknown + event history
GET /unknowns/{id}
Include:
- The artifact.
- Latest metrics.
- Recent events (with pagination).
5.3 Update status / suppression
PATCH /unknowns/{id}
Body options:
{
"status": "suppressed",
"notes": "Reviewed; internal diagnostics binary.",
"suppression": {
"expiresAt": "2025-12-31T00:00:00Z"
}
}
Backend:
- Validates transition (cannot un‑suppress to “unresolved” without event).
- Writes to
unknown_suppressions. - Writes
status_changed+suppression_appliedevents.
5.4 Image rollups
GET /images/{imageDigest}/unknowns/summary
Response:
{
"imageDigest": "sha256:...",
"computedAt": "...",
"unknownCountTotal": 40,
"unknownCountUnresolved": 30,
"unknownCountHighUr": 4,
"p50Ur": 0.35,
"p90Ur": 0.82,
"topNUrSum": 2.4,
"medianAgeDays": 9
}
This is what CI and UI will mostly query.
6. Trust‑decay job & rollup computation
6.1 Periodic recompute job
Schedule (e.g., every hour):
-
Fetch
unknown_artifactswhere:status IN ('unresolved', 'suppressed', 'mitigated')last_observed_at >= now() - interval '90 days'(tunable)
-
Compute
daysOpen = now() - first_observed_at. -
Compute
D_tandUR_twith scoring library. -
Update
unknown_artifacts.current_ur,current_decay_multiplier. -
Append
metrics_recomputedevent (batch size threshold, e.g., only when UR changed > 0.01).
6.2 Rollup job
Every X minutes:
-
For each
image_digestwith active unknowns:-
Compute:
unknown_count_totalunknown_count_unresolved(status = unresolved)unknown_count_high_ur(UR ≥ threshold)p50/p90UR (use DB percentile or compute in app)top_n_ur_sum(sum of top 5 UR)median_age_days
-
-
Upsert into
unknown_image_rollups.
7. CI / promotion gating
Expose a simple policy evaluation API for CI and deploy pipelines.
7.1 Policy definition (config)
Example YAML:
unknownsPolicy:
blockIf:
- kind: "anyUrAboveThreshold"
threshold: 0.8
- kind: "countAboveAge"
maxCount: 5
ageDays: 14
warnIf:
- kind: "unknownCountAbove"
maxCount: 50
7.2 Policy evaluation endpoint
GET /policy/unknowns/evaluate?imageDigest=sha256:...
Response:
{
"imageDigest": "sha256:...",
"result": "block", // "ok" | "warn" | "block"
"reasons": [
{
"kind": "anyUrAboveThreshold",
"detail": "1 unknown with UR>=0.8 (max allowed: 0)"
}
],
"summary": {
"unknownCountUnresolved": 30,
"p90Ur": 0.82,
"medianAgeDays": 17
}
}
CI can decide to fail build/deploy based on result.
8. UI implementation (Trust Algebra Studio)
8.1 Image detail page: “Unknowns” tab
Components:
-
Header metrics ribbon
- Unknowns unresolved, p90 UR, median age, weekly trend sparkline.
- Fetch from
/images/{digest}/unknowns/summary.
-
Unknowns table
-
Columns:
- Status pill
- UR (with color + tooltip showing
B,A,T,D_t) - Classification type/reason
- File path
- Age
- Last observed
-
Filters:
- Status, UR range, age range, reason, type.
-
-
Row drawer / detail panel
-
Show:
-
All core fields.
-
Evidence:
- origin (scanner, feed, runtime)
- raw signals (entropy, sections, etc)
- SBOM component link (if any)
-
Timeline (events list)
-
-
Actions:
- Change status (unresolved → suppressed/mitigated/confirmed).
- Add note.
- Set/extend suppression expiry.
-
8.2 Global “Unknowns board”
Goals:
- Portfolio view; triage across many images.
Features:
-
Filters by:
- Team/application/service
- Time range for first observed
- UR bucket (0–0.3, 0.3–0.6, 0.6–1)
-
Cards/rows per image:
- Unknown counts, p90 UR, median age.
- Trend of unknown count (last N weeks).
-
Click through to image‑detail tab.
8.3 “What‑if” slider (optional v1.1)
On an image or org-level:
-
Slider(s) to visualize effect of:
k/lambdachange (decay speed).- Trust baseline changes (simulate better attestations).
-
Implement by calling a stateless endpoint:
-
POST /unknowns/what-ifwith:- Current unknowns list IDs
- Proposed decay policy
-
Returns recalculated URs and hypothetical gate result (but does not persist).
-
9. Observability & analytics
9.1 Metrics
Emit structured events/metrics (OpenTelemetry, etc.):
-
Counters:
unknowns_ingested_total(labels:source,classification_type,reason)unknowns_resolved_total(labels:status)
-
Gauges:
unknowns_unresolved_countper image/service.unknowns_p90_urper image/service.unknowns_median_age_days.
9.2 Weekly report generator
Batch job:
-
Compute, per org or team:
-
Total unknowns.
-
New unknowns this week.
-
Resolved unknowns this week.
-
Median age.
-
Top 10 images by:
- Highest p90 UR.
- Largest number of long‑lived unknowns (> X days).
-
-
Persist into analytics store (ClickHouse) + push into:
- Slack channel / email with a short plain‑text summary and link to UI.
10. Security & compliance
-
Ensure all APIs require authentication & proper scopes:
- Scanner ingest: internal service token only.
- UI APIs: user identity + RBAC (e.g., team can only see their images).
-
Audit log:
unknown_artifact_eventsmust be immutable and queryable by compliance teams.
-
PII:
- Avoid storing user PII in notes; if necessary, apply redaction.
11. Suggested delivery plan (sprints/epics)
Sprint 1 – Foundations & ingest path
- DB migrations:
unknown_artifacts,unknown_artifact_events,decay_policies. - Implement scoring library (
B,A,T,UR_t,D_t). - Implement
/internal/unknowns/ingestendpoint with idempotency. - Extend scanner to emit unknowns and integrate with ingest.
- Basic
GET /unknowns?imageDigest=...API. - Seed
decay:default:v1policy.
Exit criteria: Unknowns created and UR computed from real scans; queryable via API.
Sprint 2 – Decay, rollups, and CI hook
- Implement periodic job to recompute decay & UR.
- Implement rollup job +
unknown_image_rollupstable. - Implement
GET /images/{digest}/unknowns/summary. - Implement policy evaluation endpoint for CI.
- Wire CI to block/warn based on policy.
Exit criteria: CI gate can fail a build due to high‑risk unknowns; rollups visible via API.
Sprint 3 – UI (Unknowns tab + board)
-
Image detail “Unknowns” tab:
- Metrics ribbon, table, filters.
- Row drawer with evidence & history.
-
Global “Unknowns board” page.
-
Integrate with APIs.
-
Add basic “explainability tooltip” for UR.
Exit criteria: Security team can triage unknowns via UI; product teams can see their exposure.
Sprint 4 – Suppression workflow & reporting
- Implement
PATCH /unknowns/{id}+ suppression rules & expiries. - Extend periodic jobs to auto‑expire suppressions.
- Weekly unknowns report job → analytics + Slack/email.
- Add “trend” sparklines and unknowns burn‑down in UI.
Exit criteria: Unknowns can be suppressed with justification; org gets weekly burn‑down trends.
If you’d like, I can next:
- Turn this into concrete tickets (Jira-style) with story points and acceptance criteria, or
- Generate example migration scripts (SQL) and API contract files (OpenAPI snippet) that your devs can copy‑paste.