up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
This commit is contained in:
@@ -0,0 +1,819 @@
|
||||
Here’s a crisp, ready‑to‑ship concept you can drop into Stella Ops: an **Unknowns Registry** that captures ambiguous scanner artifacts (stripped binaries, unverifiable packages, orphaned PURLs, missing digests) and treats them as first‑class citizens with probabilistic severity and trust‑decay—so you stay transparent without blocking delivery.
|
||||
|
||||
### What this solves (in plain terms)
|
||||
|
||||
* **No silent drops:** every “can’t verify / can’t resolve” is tracked, not discarded.
|
||||
* **Quantified risk:** unknowns still roll into a portfolio‑level risk number with confidence intervals.
|
||||
* **Trust over time:** stale unknowns get *riskier* the longer they remain unresolved.
|
||||
* **Client confidence:** visibility + trajectory (are unknowns shrinking?) becomes a maturity signal.
|
||||
|
||||
### Core data model (CycloneDX/SPDX compatible, attaches to your SBOM spine)
|
||||
|
||||
```yaml
|
||||
UnknownArtifact:
|
||||
id: urn:stella:unknowns:<uuid>
|
||||
observedAt: <RFC3339>
|
||||
origin:
|
||||
source: scanner|ingest|runtime
|
||||
feed: <name/version>
|
||||
evidence: [ filePath, containerDigest, buildId, sectionHints ]
|
||||
identifiers:
|
||||
purl?: <string> # orphan/incomplete PURL allowed
|
||||
hash?: <sha256|null> # missing digest allowed
|
||||
cpe?: <string|null>
|
||||
classification:
|
||||
type: binary|library|package|script|config|other
|
||||
reason: stripped_binary|missing_signature|no_feed_match|ambiguous_name|checksum_mismatch|other
|
||||
metrics:
|
||||
baseUnkScore: 0..1
|
||||
confidence: 0..1 # model confidence in the *score*
|
||||
trust: 0..1 # provenance trust (sig/attest, feed quality)
|
||||
decayPolicyId: <ref>
|
||||
resolution:
|
||||
status: unresolved|suppressed|mitigated|confirmed-benign|confirmed-risk
|
||||
updatedAt: <RFC3339>
|
||||
notes: <text>
|
||||
links:
|
||||
scanId: <ref>
|
||||
componentId?: <ref to SBOM component if later mapped>
|
||||
attestations?: [ dsse, in-toto, rekorRef ]
|
||||
```
|
||||
|
||||
### Scoring (simple, explainable, deterministic)
|
||||
|
||||
* **Unknown Risk (UR):**
|
||||
`UR_t = clamp( (B * (1 + A)) * D_t * (1 - T) , 0, 1 )`
|
||||
|
||||
* `B` = `baseUnkScore` (heuristics: file entropy, section hints, ELF flags, import tables, size, location)
|
||||
* `A` = **Environment Amplifier** (runtime proximity: container entrypoint? PID namespace? network caps?)
|
||||
* `T` = **Trust** (sig/attest/registry reputation/feed pedigree normalized to 0..1)
|
||||
* `D_t` = **Trust‑decay multiplier** over time `t`:
|
||||
|
||||
* Linear: `D_t = 1 + k * daysOpen` (e.g., `k = 0.01`)
|
||||
* or Exponential: `D_t = e^(λ * daysOpen)` (e.g., `λ = 0.005`)
|
||||
* **Portfolio roll‑up:** use **P90 of UR_t** across images + **sum of top‑N UR_t** to avoid dilution.
|
||||
|
||||
### Policies & SLOs
|
||||
|
||||
* **SLO:** *Unknowns burn‑down* ≤ X% week‑over‑week; *Median age* ≤ Y days.
|
||||
* **Gates:** block promotion when (a) any `UR_t ≥ 0.8`, or (b) more than `M` unknowns with age > `Z` days.
|
||||
* **Suppressions:** require justification + expiry; suppression reduces `A` but does **not** zero `D_t`.
|
||||
|
||||
### Trust‑decay policies (pluggable)
|
||||
|
||||
```yaml
|
||||
DecayPolicy:
|
||||
id: decay:default:v1
|
||||
kind: linear|exponential|custom
|
||||
params:
|
||||
k: 0.01 # linear slope per day
|
||||
cap: 2.0 # max multiplier
|
||||
```
|
||||
|
||||
### Scanner hooks (where to emit Unknowns)
|
||||
|
||||
* **Binary scan:** stripped ELF/Mach‑O/PE; missing build‑ID; abnormal sections; impossible symbol map.
|
||||
* **Package map:** PURL inferred from path without registry proof; mismatched checksum; vendor fork detected.
|
||||
* **Attestation:** DSSE missing / invalid; Sigstore chain unverifiable; Rekor entry not found.
|
||||
* **Feeds:** component seen in runtime but absent from SBOM (or vice versa).
|
||||
|
||||
### Deterministic generation (for replay/audits)
|
||||
|
||||
* Include **Unknowns** in the **Scan Manifest** (your deterministic bundle): inputs, ruleset hash, feed hashes, lattice policy version, and the exact classifier thresholds that produced `B`, `A`, `T`. That lets you replay and reproduce UR_t byte‑for‑byte during audits.
|
||||
|
||||
### API surface (StellaOps.Authority)
|
||||
|
||||
```
|
||||
POST /unknowns/ingest # bulk ingest from Scanner/Vexer
|
||||
GET /unknowns?imageDigest=… # list + filters (status, age, UR buckets)
|
||||
PATCH /unknowns/{id}/resolve # set status, add evidence, set suppression (with expiry)
|
||||
GET /unknowns/stats # burn-downs, age histograms, P90 UR_t, top-N contributors
|
||||
```
|
||||
|
||||
### UI slices (Trust Algebra Studio)
|
||||
|
||||
* **Risk ribbon:** Unknowns count, P90 UR_t, median age, trend sparkline.
|
||||
* **Aging board:** columns by age buckets; cards show reason, UR_t, `T`, decay policy, evidence.
|
||||
* **What‑if slider:** adjust `k`/`λ` and see retroactive effect on release readiness (deterministic preview).
|
||||
* **Explainability panel:** show `B`, `A`, `T`, `D_t` factors with succinct evidence (e.g., “ELF stripped; no .symtab; no Sigstore; runtime hits PID 1 → A=0.2; trust=0.1; day 17 → D=1.17”).
|
||||
|
||||
### Heuristics for `baseUnkScore (B)` (portable across ELF/PE/Mach‑O)
|
||||
|
||||
* Section/segment anomalies; entropy outliers; import tables linking to risky APIs; executable heap/stack flags.
|
||||
* Location & role (PATH proximity to entrypoint, init scripts).
|
||||
* Size/type priors (tiny droppers vs fat libs).
|
||||
* Known‑unknown patterns (packer signatures, `UPX` traces without unpack attest).
|
||||
|
||||
### Governance & reporting
|
||||
|
||||
* **Weekly Unknowns Report:** totals, burn‑down, median age, “top unresolved origins” (by registry / base image), and “time‑to‑first‑evidence.”
|
||||
* **Vendor scorecards:** if a supplier causes recurring unknowns, reflect it in **Adaptive Trust Economics**.
|
||||
|
||||
### Minimal implementation slice (2 sprints)
|
||||
|
||||
1. **Sprint A (backend + ingest):** model + tables, ingest endpoint, deterministic manifest hook, basic scoring (linear decay), list/query API, roll‑up metrics.
|
||||
2. **Sprint B (UI + policy):** Unknowns board, what‑if slider, suppress/resolve workflow w/ expiry, weekly report, policy gates in CI.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can generate:
|
||||
|
||||
* the exact **PostgreSQL schema + indices** (including age/UR materialized views),
|
||||
* **.NET 10** DTOs, handlers, and a **deterministic scoring library** with unit tests,
|
||||
* a **Grafana/ClickHouse** dashboard spec for burn‑down and P90 tracking,
|
||||
* and a one‑page client‑facing explainer your sales team can use.
|
||||
Perfect, let’s turn the concept into a concrete, dev‑ready implementation plan.
|
||||
|
||||
I’ll assume:
|
||||
|
||||
* You already have:
|
||||
|
||||
* A scanner pipeline (binaries, SBOMs, attestations)
|
||||
* A backend service (StellaOps.Authority)
|
||||
* A UI (Trust Algebra Studio)
|
||||
* Observability (OpenTelemetry, ClickHouse/Presto)
|
||||
|
||||
You can adapt naming and tech stack as needed.
|
||||
|
||||
---
|
||||
|
||||
## 0. Scope & success criteria
|
||||
|
||||
**Goals**
|
||||
|
||||
1. Persist all “unknown-ish” scanner findings (stripped binaries, unverifiable PURLs, missing digests, etc.) as first‑class entities.
|
||||
2. Compute a deterministic **Unknown Risk (UR)** per artifact and roll it up per image/application.
|
||||
3. Apply **trust‑decay** over time and expose burn‑down metrics.
|
||||
4. Provide UI workflows to triage, suppress, and resolve unknowns.
|
||||
5. Enforce release gates based on unknown risk and age.
|
||||
|
||||
**Non‑goals (for v1)**
|
||||
|
||||
* No full ML; use deterministic heuristics + tunable weights.
|
||||
* No cross‑org multi‑tenant policy — single org/single policy set.
|
||||
* No per‑developer responsibility/assignment yet (can add later).
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture & components
|
||||
|
||||
### 1.1 New/updated components
|
||||
|
||||
1. **Unknowns Registry (backend submodule)**
|
||||
|
||||
* Lives in your existing backend (e.g., `StellaOps.Authority.Unknowns`).
|
||||
* Owns DB schema, scoring logic, and API.
|
||||
|
||||
2. **Scanner integration**
|
||||
|
||||
* Extend `StellaOps.Scanner` (and/or `Vexer`) to emit “unknown” findings into the registry via HTTP or message bus.
|
||||
|
||||
3. **UI: Unknowns in Trust Algebra Studio**
|
||||
|
||||
* New section/tab: “Unknowns” under each image/app.
|
||||
* Global “Unknowns board” for portfolio view.
|
||||
|
||||
4. **Analytics & jobs**
|
||||
|
||||
* Periodic job to recompute trust‑decay & UR.
|
||||
* Weekly report generator (e.g., pushing into ClickHouse, Slack, or email).
|
||||
|
||||
---
|
||||
|
||||
## 2. Data model (DB schema)
|
||||
|
||||
Use relational DB; here’s a concrete schema you can translate into migrations.
|
||||
|
||||
### 2.1 Tables
|
||||
|
||||
#### `unknown_artifacts`
|
||||
|
||||
Represents the current state of each unknown.
|
||||
|
||||
* `id` (UUID, PK)
|
||||
* `created_at` (timestamp)
|
||||
* `updated_at` (timestamp)
|
||||
* `first_observed_at` (timestamp, NOT NULL)
|
||||
* `last_observed_at` (timestamp, NOT NULL)
|
||||
* `origin_source` (enum: `scanner`, `runtime`, `ingest`)
|
||||
* `origin_feed` (text) – e.g., `binary-scanner@1.4.3`
|
||||
* `origin_scan_id` (UUID / text) – foreign key to `scan_runs` if you have it
|
||||
* `image_digest` (text, indexed) – to tie to container/image
|
||||
* `component_id` (UUID, nullable) – SBOM component when later mapped
|
||||
* `file_path` (text, nullable)
|
||||
* `build_id` (text, nullable) – ELF/Mach-O/PE build ID if any
|
||||
* `purl` (text, nullable)
|
||||
* `hash_sha256` (text, nullable)
|
||||
* `cpe` (text, nullable)
|
||||
* `classification_type` (enum: `binary`, `library`, `package`, `script`, `config`, `other`)
|
||||
* `classification_reason` (enum:
|
||||
`stripped_binary`, `missing_signature`, `no_feed_match`,
|
||||
`ambiguous_name`, `checksum_mismatch`, `other`)
|
||||
* `status` (enum:
|
||||
`unresolved`, `suppressed`, `mitigated`, `confirmed_benign`, `confirmed_risk`)
|
||||
* `status_changed_at` (timestamp)
|
||||
* `status_changed_by` (text / user-id)
|
||||
* `notes` (text)
|
||||
* `decay_policy_id` (FK → `decay_policies`)
|
||||
* `base_unk_score` (double, 0..1)
|
||||
* `env_amplifier` (double, 0..1)
|
||||
* `trust` (double, 0..1)
|
||||
* `current_decay_multiplier` (double)
|
||||
* `current_ur` (double, 0..1) – Unknown Risk at last recompute
|
||||
* `current_confidence` (double, 0..1) – confidence in `current_ur`
|
||||
* `is_deleted` (bool) – soft delete
|
||||
|
||||
**Indexes**
|
||||
|
||||
* `idx_unknown_artifacts_image_digest_status`
|
||||
* `idx_unknown_artifacts_status_created_at`
|
||||
* `idx_unknown_artifacts_current_ur`
|
||||
* `idx_unknown_artifacts_last_observed_at`
|
||||
|
||||
#### `unknown_artifact_events`
|
||||
|
||||
Append-only event log for auditable changes.
|
||||
|
||||
* `id` (UUID, PK)
|
||||
* `unknown_artifact_id` (FK → `unknown_artifacts`)
|
||||
* `created_at` (timestamp)
|
||||
* `actor` (text / user-id / system)
|
||||
* `event_type` (enum:
|
||||
`created`, `reobserved`, `status_changed`, `note_added`,
|
||||
`metrics_recomputed`, `linked_component`, `suppression_applied`, `suppression_expired`)
|
||||
* `payload` (JSONB) – diff or event‑specific details
|
||||
|
||||
Index: `idx_unknown_artifact_events_artifact_id_created_at`
|
||||
|
||||
#### `decay_policies`
|
||||
|
||||
Defines how trust‑decay works.
|
||||
|
||||
* `id` (text, PK) – e.g., `decay:default:v1`
|
||||
* `kind` (enum: `linear`, `exponential`)
|
||||
* `param_k` (double, nullable) – for linear: slope
|
||||
* `param_lambda` (double, nullable) – for exponential
|
||||
* `cap` (double, default 2.0)
|
||||
* `description` (text)
|
||||
* `is_default` (bool)
|
||||
|
||||
#### `unknown_suppressions`
|
||||
|
||||
Optional; can also reuse `unknown_artifacts.status` but separate table lets you have multiple suppressions over time.
|
||||
|
||||
* `id` (UUID, PK)
|
||||
* `unknown_artifact_id` (FK)
|
||||
* `created_at` (timestamp)
|
||||
* `created_by` (text)
|
||||
* `reason` (text)
|
||||
* `expires_at` (timestamp, nullable)
|
||||
* `active` (bool)
|
||||
|
||||
Index: `idx_unknown_suppressions_artifact_active_expires_at`
|
||||
|
||||
#### `unknown_image_rollups`
|
||||
|
||||
Precomputed rollups per image (for fast dashboards/gates).
|
||||
|
||||
* `id` (UUID, PK)
|
||||
* `image_digest` (text, indexed)
|
||||
* `computed_at` (timestamp)
|
||||
* `unknown_count_total` (int)
|
||||
* `unknown_count_unresolved` (int)
|
||||
* `unknown_count_high_ur` (int) – e.g., UR ≥ 0.8
|
||||
* `p50_ur` (double)
|
||||
* `p90_ur` (double)
|
||||
* `top_n_ur_sum` (double)
|
||||
* `median_age_days` (double)
|
||||
|
||||
---
|
||||
|
||||
## 3. Scoring engine implementation
|
||||
|
||||
Create a small, deterministic scoring library so the same code can be used in:
|
||||
|
||||
* Backend ingest path (for immediate UR)
|
||||
* Batch recompute job
|
||||
* “What‑if” UI simulations (optionally via stateless API)
|
||||
|
||||
### 3.1 Data types
|
||||
|
||||
Define a core model, e.g.:
|
||||
|
||||
```ts
|
||||
type UnknownMetricsInput = {
|
||||
baseUnkScore: number; // B
|
||||
envAmplifier: number; // A
|
||||
trust: number; // T
|
||||
daysOpen: number; // t
|
||||
decayPolicy: {
|
||||
kind: "linear" | "exponential";
|
||||
k?: number;
|
||||
lambda?: number;
|
||||
cap: number;
|
||||
};
|
||||
};
|
||||
|
||||
type UnknownMetricsOutput = {
|
||||
decayMultiplier: number; // D_t
|
||||
unknownRisk: number; // UR_t
|
||||
};
|
||||
```
|
||||
|
||||
### 3.2 Algorithm
|
||||
|
||||
```ts
|
||||
function computeDecayMultiplier(
|
||||
daysOpen: number,
|
||||
policy: DecayPolicy
|
||||
): number {
|
||||
if (policy.kind === "linear") {
|
||||
const raw = 1 + (policy.k ?? 0) * daysOpen;
|
||||
return Math.min(raw, policy.cap);
|
||||
}
|
||||
if (policy.kind === "exponential") {
|
||||
const lambda = policy.lambda ?? 0;
|
||||
const raw = Math.exp(lambda * daysOpen);
|
||||
return Math.min(raw, policy.cap);
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
function computeUnknownRisk(input: UnknownMetricsInput): UnknownMetricsOutput {
|
||||
const { baseUnkScore: B, envAmplifier: A, trust: T, daysOpen, decayPolicy } = input;
|
||||
|
||||
const D_t = computeDecayMultiplier(daysOpen, decayPolicy);
|
||||
const raw = (B * (1 + A)) * D_t * (1 - T);
|
||||
|
||||
const unknownRisk = Math.max(0, Math.min(raw, 1)); // clamp 0..1
|
||||
|
||||
return { decayMultiplier: D_t, unknownRisk };
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Heuristics for `B`, `A`, `T`
|
||||
|
||||
Implement these as pure functions with configuration‑driven weights:
|
||||
|
||||
* `B` (base unknown score):
|
||||
|
||||
* Start from prior: by `classification_type` (binary > library > config).
|
||||
* Adjust up for:
|
||||
|
||||
* Stripped binary (no symbols, high entropy)
|
||||
* Suspicious segments (executable stack/heap)
|
||||
* Known packer signatures (UPX, etc.)
|
||||
* Adjust down for:
|
||||
|
||||
* Large, well‑known dependency path (`/usr/lib/...`)
|
||||
* Known safe signatures (if partially known).
|
||||
|
||||
* `A` (environment amplifier):
|
||||
|
||||
* +0.2 if artifact is part of container entrypoint (PID 1).
|
||||
* +0.1 if file is in a PATH dir (e.g., `/usr/local/bin`).
|
||||
* +0.1 if the runtime has network capabilities/capabilities flags.
|
||||
* Cap at 0.5 for v1.
|
||||
|
||||
* `T` (trust):
|
||||
|
||||
* Start at 0.5.
|
||||
* +0.3 if registry/signature/attestation chain verified.
|
||||
* +0.1 if source registry is “trusted vendor list”.
|
||||
* −0.3 if checksum mismatch or feed conflict.
|
||||
* Clamp 0..1.
|
||||
|
||||
Store the raw factors (`B`, `A`, `T`) on the artifact for transparency and later replays.
|
||||
|
||||
---
|
||||
|
||||
## 4. Scanner integration
|
||||
|
||||
### 4.1 Emission format (from scanner → backend)
|
||||
|
||||
Define a minimal ingestion contract (JSON over HTTP or a message):
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"scanId": "urn:scan:1234",
|
||||
"imageDigest": "sha256:abc123...",
|
||||
"observedAt": "2025-11-27T12:34:56Z",
|
||||
"unknowns": [
|
||||
{
|
||||
"externalId": "scanner-unique-id-1",
|
||||
"originSource": "scanner",
|
||||
"originFeed": "binary-scanner@1.4.3",
|
||||
"filePath": "/usr/local/bin/stripped",
|
||||
"buildId": null,
|
||||
"purl": null,
|
||||
"hashSha256": "aa...",
|
||||
"cpe": null,
|
||||
"classificationType": "binary",
|
||||
"classificationReason": "stripped_binary",
|
||||
"rawSignals": {
|
||||
"entropy": 7.4,
|
||||
"hasSymbols": false,
|
||||
"isEntrypoint": true,
|
||||
"inPathDir": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The backend maps `rawSignals` → `B`, `A`, `T`.
|
||||
|
||||
### 4.2 Idempotency
|
||||
|
||||
* Define uniqueness key on `(image_digest, file_path, hash_sha256)` for v1.
|
||||
* On ingest:
|
||||
|
||||
* If an artifact exists:
|
||||
|
||||
* Update `last_observed_at`.
|
||||
* Recompute age (`now - first_observed_at`) and UR.
|
||||
* Add `reobserved` event.
|
||||
* If not:
|
||||
|
||||
* Insert new row with `first_observed_at = observedAt`.
|
||||
|
||||
### 4.3 HTTP endpoint
|
||||
|
||||
`POST /internal/unknowns/ingest`
|
||||
|
||||
* Auth: internal service token.
|
||||
* Returns per‑unknown mapping to internal `id` and computed UR.
|
||||
|
||||
Error handling:
|
||||
|
||||
* If invalid payload → 400 with list of errors.
|
||||
* Partial failure: process valid unknowns, return `failedUnknowns` array with reasons.
|
||||
|
||||
---
|
||||
|
||||
## 5. Backend API for UI & CI
|
||||
|
||||
### 5.1 List unknowns
|
||||
|
||||
`GET /unknowns`
|
||||
|
||||
Query params:
|
||||
|
||||
* `imageDigest` (optional)
|
||||
* `status` (optional multi: unresolved, suppressed, etc.)
|
||||
* `minUr`, `maxUr` (optional)
|
||||
* `maxAgeDays` (optional)
|
||||
* `page`, `pageSize`
|
||||
|
||||
Response:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"id": "urn:stella:unknowns:uuid",
|
||||
"imageDigest": "sha256:...",
|
||||
"filePath": "/usr/local/bin/stripped",
|
||||
"classificationType": "binary",
|
||||
"classificationReason": "stripped_binary",
|
||||
"status": "unresolved",
|
||||
"firstObservedAt": "...",
|
||||
"lastObservedAt": "...",
|
||||
"ageDays": 17,
|
||||
"baseUnkScore": 0.7,
|
||||
"envAmplifier": 0.2,
|
||||
"trust": 0.1,
|
||||
"decayPolicyId": "decay:default:v1",
|
||||
"decayMultiplier": 1.17,
|
||||
"currentUr": 0.84,
|
||||
"currentConfidence": 0.8
|
||||
}
|
||||
],
|
||||
"total": 123
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Get single unknown + event history
|
||||
|
||||
`GET /unknowns/{id}`
|
||||
|
||||
Include:
|
||||
|
||||
* The artifact.
|
||||
* Latest metrics.
|
||||
* Recent events (with pagination).
|
||||
|
||||
### 5.3 Update status / suppression
|
||||
|
||||
`PATCH /unknowns/{id}`
|
||||
|
||||
Body options:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"status": "suppressed",
|
||||
"notes": "Reviewed; internal diagnostics binary.",
|
||||
"suppression": {
|
||||
"expiresAt": "2025-12-31T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Backend:
|
||||
|
||||
* Validates transition (cannot un‑suppress to “unresolved” without event).
|
||||
* Writes to `unknown_suppressions`.
|
||||
* Writes `status_changed` + `suppression_applied` events.
|
||||
|
||||
### 5.4 Image rollups
|
||||
|
||||
`GET /images/{imageDigest}/unknowns/summary`
|
||||
|
||||
Response:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"imageDigest": "sha256:...",
|
||||
"computedAt": "...",
|
||||
"unknownCountTotal": 40,
|
||||
"unknownCountUnresolved": 30,
|
||||
"unknownCountHighUr": 4,
|
||||
"p50Ur": 0.35,
|
||||
"p90Ur": 0.82,
|
||||
"topNUrSum": 2.4,
|
||||
"medianAgeDays": 9
|
||||
}
|
||||
```
|
||||
|
||||
This is what CI and UI will mostly query.
|
||||
|
||||
---
|
||||
|
||||
## 6. Trust‑decay job & rollup computation
|
||||
|
||||
### 6.1 Periodic recompute job
|
||||
|
||||
Schedule (e.g., every hour):
|
||||
|
||||
1. Fetch `unknown_artifacts` where:
|
||||
|
||||
* `status IN ('unresolved', 'suppressed', 'mitigated')`
|
||||
* `last_observed_at >= now() - interval '90 days'` (tunable)
|
||||
2. Compute `daysOpen = now() - first_observed_at`.
|
||||
3. Compute `D_t` and `UR_t` with scoring library.
|
||||
4. Update `unknown_artifacts.current_ur`, `current_decay_multiplier`.
|
||||
5. Append `metrics_recomputed` event (batch size threshold, e.g., only when UR changed > 0.01).
|
||||
|
||||
### 6.2 Rollup job
|
||||
|
||||
Every X minutes:
|
||||
|
||||
1. For each `image_digest` with active unknowns:
|
||||
|
||||
* Compute:
|
||||
|
||||
* `unknown_count_total`
|
||||
* `unknown_count_unresolved` (`status = unresolved`)
|
||||
* `unknown_count_high_ur` (UR ≥ threshold)
|
||||
* `p50` / `p90` UR (use DB percentile or compute in app)
|
||||
* `top_n_ur_sum` (sum of top 5 UR)
|
||||
* `median_age_days`
|
||||
2. Upsert into `unknown_image_rollups`.
|
||||
|
||||
---
|
||||
|
||||
## 7. CI / promotion gating
|
||||
|
||||
Expose a simple policy evaluation API for CI and deploy pipelines.
|
||||
|
||||
### 7.1 Policy definition (config)
|
||||
|
||||
Example YAML:
|
||||
|
||||
```yaml
|
||||
unknownsPolicy:
|
||||
blockIf:
|
||||
- kind: "anyUrAboveThreshold"
|
||||
threshold: 0.8
|
||||
- kind: "countAboveAge"
|
||||
maxCount: 5
|
||||
ageDays: 14
|
||||
warnIf:
|
||||
- kind: "unknownCountAbove"
|
||||
maxCount: 50
|
||||
```
|
||||
|
||||
### 7.2 Policy evaluation endpoint
|
||||
|
||||
`GET /policy/unknowns/evaluate?imageDigest=sha256:...`
|
||||
|
||||
Response:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"imageDigest": "sha256:...",
|
||||
"result": "block", // "ok" | "warn" | "block"
|
||||
"reasons": [
|
||||
{
|
||||
"kind": "anyUrAboveThreshold",
|
||||
"detail": "1 unknown with UR>=0.8 (max allowed: 0)"
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"unknownCountUnresolved": 30,
|
||||
"p90Ur": 0.82,
|
||||
"medianAgeDays": 17
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
CI can decide to fail build/deploy based on `result`.
|
||||
|
||||
---
|
||||
|
||||
## 8. UI implementation (Trust Algebra Studio)
|
||||
|
||||
### 8.1 Image detail page: “Unknowns” tab
|
||||
|
||||
Components:
|
||||
|
||||
1. **Header metrics ribbon**
|
||||
|
||||
* Unknowns unresolved, p90 UR, median age, weekly trend sparkline.
|
||||
* Fetch from `/images/{digest}/unknowns/summary`.
|
||||
|
||||
2. **Unknowns table**
|
||||
|
||||
* Columns:
|
||||
|
||||
* Status pill
|
||||
* UR (with color + tooltip showing `B`, `A`, `T`, `D_t`)
|
||||
* Classification type/reason
|
||||
* File path
|
||||
* Age
|
||||
* Last observed
|
||||
* Filters:
|
||||
|
||||
* Status, UR range, age range, reason, type.
|
||||
|
||||
3. **Row drawer / detail panel**
|
||||
|
||||
* Show:
|
||||
|
||||
* All core fields.
|
||||
* Evidence:
|
||||
|
||||
* origin (scanner, feed, runtime)
|
||||
* raw signals (entropy, sections, etc)
|
||||
* SBOM component link (if any)
|
||||
* Timeline (events list)
|
||||
* Actions:
|
||||
|
||||
* Change status (unresolved → suppressed/mitigated/confirmed).
|
||||
* Add note.
|
||||
* Set/extend suppression expiry.
|
||||
|
||||
### 8.2 Global “Unknowns board”
|
||||
|
||||
Goals:
|
||||
|
||||
* Portfolio view; triage across many images.
|
||||
|
||||
Features:
|
||||
|
||||
* Filters by:
|
||||
|
||||
* Team/application/service
|
||||
* Time range for first observed
|
||||
* UR bucket (0–0.3, 0.3–0.6, 0.6–1)
|
||||
* Cards/rows per image:
|
||||
|
||||
* Unknown counts, p90 UR, median age.
|
||||
* Trend of unknown count (last N weeks).
|
||||
* Click through to image‑detail tab.
|
||||
|
||||
### 8.3 “What‑if” slider (optional v1.1)
|
||||
|
||||
On an image or org-level:
|
||||
|
||||
* Slider(s) to visualize effect of:
|
||||
|
||||
* `k` / `lambda` change (decay speed).
|
||||
* Trust baseline changes (simulate better attestations).
|
||||
* Implement by calling a stateless endpoint:
|
||||
|
||||
* `POST /unknowns/what-if` with:
|
||||
|
||||
* Current unknowns list IDs
|
||||
* Proposed decay policy
|
||||
* Returns recalculated URs and hypothetical gate result (but does **not** persist).
|
||||
|
||||
---
|
||||
|
||||
## 9. Observability & analytics
|
||||
|
||||
### 9.1 Metrics
|
||||
|
||||
Emit structured events/metrics (OpenTelemetry, etc.):
|
||||
|
||||
* Counters:
|
||||
|
||||
* `unknowns_ingested_total` (labels: `source`, `classification_type`, `reason`)
|
||||
* `unknowns_resolved_total` (labels: `status`)
|
||||
* Gauges:
|
||||
|
||||
* `unknowns_unresolved_count` per image/service.
|
||||
* `unknowns_p90_ur` per image/service.
|
||||
* `unknowns_median_age_days`.
|
||||
|
||||
### 9.2 Weekly report generator
|
||||
|
||||
Batch job:
|
||||
|
||||
1. Compute, per org or team:
|
||||
|
||||
* Total unknowns.
|
||||
* New unknowns this week.
|
||||
* Resolved unknowns this week.
|
||||
* Median age.
|
||||
* Top 10 images by:
|
||||
|
||||
* Highest p90 UR.
|
||||
* Largest number of long‑lived unknowns (> X days).
|
||||
2. Persist into analytics store (ClickHouse) + push into:
|
||||
|
||||
* Slack channel / email with a short plain‑text summary and link to UI.
|
||||
|
||||
---
|
||||
|
||||
## 10. Security & compliance
|
||||
|
||||
* Ensure all APIs require authentication & proper scopes:
|
||||
|
||||
* Scanner ingest: internal service token only.
|
||||
* UI APIs: user identity + RBAC (e.g., team can only see their images).
|
||||
* Audit log:
|
||||
|
||||
* `unknown_artifact_events` must be immutable and queryable by compliance teams.
|
||||
* PII:
|
||||
|
||||
* Avoid storing user PII in notes; if necessary, apply redaction.
|
||||
|
||||
---
|
||||
|
||||
## 11. Suggested delivery plan (sprints/epics)
|
||||
|
||||
### Sprint 1 – Foundations & ingest path
|
||||
|
||||
* [ ] DB migrations: `unknown_artifacts`, `unknown_artifact_events`, `decay_policies`.
|
||||
* [ ] Implement scoring library (`B`, `A`, `T`, `UR_t`, `D_t`).
|
||||
* [ ] Implement `/internal/unknowns/ingest` endpoint with idempotency.
|
||||
* [ ] Extend scanner to emit unknowns and integrate with ingest.
|
||||
* [ ] Basic `GET /unknowns?imageDigest=...` API.
|
||||
* [ ] Seed `decay:default:v1` policy.
|
||||
|
||||
**Exit criteria:** Unknowns created and UR computed from real scans; queryable via API.
|
||||
|
||||
---
|
||||
|
||||
### Sprint 2 – Decay, rollups, and CI hook
|
||||
|
||||
* [ ] Implement periodic job to recompute decay & UR.
|
||||
* [ ] Implement rollup job + `unknown_image_rollups` table.
|
||||
* [ ] Implement `GET /images/{digest}/unknowns/summary`.
|
||||
* [ ] Implement policy evaluation endpoint for CI.
|
||||
* [ ] Wire CI to block/warn based on policy.
|
||||
|
||||
**Exit criteria:** CI gate can fail a build due to high‑risk unknowns; rollups visible via API.
|
||||
|
||||
---
|
||||
|
||||
### Sprint 3 – UI (Unknowns tab + board)
|
||||
|
||||
* [ ] Image detail “Unknowns” tab:
|
||||
|
||||
* Metrics ribbon, table, filters.
|
||||
* Row drawer with evidence & history.
|
||||
* [ ] Global “Unknowns board” page.
|
||||
* [ ] Integrate with APIs.
|
||||
* [ ] Add basic “explainability tooltip” for UR.
|
||||
|
||||
**Exit criteria:** Security team can triage unknowns via UI; product teams can see their exposure.
|
||||
|
||||
---
|
||||
|
||||
### Sprint 4 – Suppression workflow & reporting
|
||||
|
||||
* [ ] Implement `PATCH /unknowns/{id}` + suppression rules & expiries.
|
||||
* [ ] Extend periodic jobs to auto‑expire suppressions.
|
||||
* [ ] Weekly unknowns report job → analytics + Slack/email.
|
||||
* [ ] Add “trend” sparklines and unknowns burn‑down in UI.
|
||||
|
||||
**Exit criteria:** Unknowns can be suppressed with justification; org gets weekly burn‑down trends.
|
||||
|
||||
---
|
||||
|
||||
If you’d like, I can next:
|
||||
|
||||
* Turn this into concrete tickets (Jira-style) with story points and acceptance criteria, or
|
||||
* Generate example migration scripts (SQL) and API contract files (OpenAPI snippet) that your devs can copy‑paste.
|
||||
Reference in New Issue
Block a user