Here’s a compact, low-friction way to tame “unknowns” in Stella Ops without boiling the ocean: two heuristics you can prototype this week—each yields one clear artifact you can show in the UI and wire into the next planning cycle.

---

# 1) Decaying Confidence (Half-Life) for Unknown Reachability

**Idea:** every “unknown” reachability/verdict starts with a confidence score that **decays over time** (exponential half-life). If no new evidence arrives, confidence naturally drifts toward “needs refresh,” preventing stale assumptions from lingering.

**Why it helps (plain English):** unknowns don’t stay “probably fine” forever—this makes them self-expiring, so triage resurfaces them at the right time instead of only when something breaks.

**Minimal data model (UnknownsRegistry):**

```json
{
  "unknown_id": "URN:unknown:pkg/npm/lodash:4.17.21:CVE-2021-23337:reachability",
  "subject_ref": { "type": "package", "purl": "pkg:npm/lodash@4.17.21" },
  "vuln_id": "CVE-2021-23337",
  "dimension": "reachability", 
  "confidence": { "value": 0.78, "method": "half_life", "t0": "2025-11-29T12:00:00Z", "half_life_days": 14 },
  "evidence": [{ "kind": "static_scan_hint", "hash": "…" }],
  "next_review_at": "2025-12-06T12:00:00Z",
  "status": "unknown"
}
```

**Update rule (per tick or on read):**

* `confidence_now = confidence_t0 * 0.5^(Δdays / half_life_days)`
* When `confidence_now < threshold_low` → flag for human review (see Queue below).
* When fresh evidence arrives → reset `t0`, optionally raise confidence.

**One UI artifact:**
A **“Confidence Decay Card”** on each unknown, showing:

* sparkline of decay over time,
* next review ETA,
* button “Refresh with latest evidence” (re-run reachability probes).

**One ops hook (planning):**
Export a **daily CSV/JSON of unknowns whose confidence crossed threshold** to feed the triage board.

---

# 2) Human-Review Queue for High-Impact Unknowns

**Idea:** only a subset of unknowns deserve people time. Auto-rank them by potential blast radius + decayed confidence.

**Triage score (simple, transparent):**
`triage_score = impact_score * (1 - confidence_now)`

* `impact_score` (0–1): runtime exposure, privilege, prevalence, SLA tier.
* `confidence_now`: from heuristic #1.

**Queue item schema (artifact to display & act on):**

```json
{
  "queue_item_id": "TRIAGE:unknown:…",
  "unknown_id": "URN:unknown:…",
  "triage_score": 0.74,
  "impact_factors": { "runtime_presence": true, "privilege": "high", "fleet_prevalence": 0.62, "sla_tier": "gold" },
  "confidence_now": 0.28,
  "assigned_to": "unassigned",
  "due_by": "2025-12-02T17:00:00Z",
  "actions": [
    { "type": "collect_runtime_trace", "cost": "low" },
    { "type": "symbolic_slice_probe", "cost": "medium" },
    { "type": "vendor_VEX_request", "cost": "low" }
  ],
  "audit": [{ "at": "2025-11-29T12:05:00Z", "who": "system", "what": "enqueued: threshold_low crossed" }]
}
```

**One UI artifact:**
A **“High-Impact Unknowns” queue view** sorted by `triage_score`, showing:

* pill tags for impact factors,
* inline actions (Assign, Probe, Add Evidence, Mark Resolved),
* SLO badge showing `due_by`.

**One ops hook (planning):**
Pull top N items by `triage_score` at sprint start. Each resolved item must attach new evidence or a documented “Not Affected” rationale so the next decay cycle begins from stronger assumptions.

---

## Wiring into Stella Ops quickly (dev notes)

* **Storage:** add `UnknownsRegistry` collection/table; compute decay on read to avoid cron churn.
* **Thresholds:** start with `half_life_days = 14`, `threshold_low = 0.35`; tune later.
* **Impact scoring:** begin with simple weights in config (runtime_presence=0.4, privilege=0.3, prevalence=0.2, SLA=0.1).
* **APIs:**
  * `GET /unknowns?stale=true` (confidence < threshold)
  * `POST /triage/enqueue` (system-owned)
  * `POST /unknowns/{id}/evidence` (resets t0, recomputes next_review_at)
* **Events:** emit `UnknownConfidenceCrossedLow` → `TriageItemCreated`.

---

## What you’ll have after a 1–2 day spike

* A decay card on each unknown + a simple, sortable triage queue.
* A daily export artifact to drive planning.
* A clear, auditable path from “we’re unsure” → “we gathered evidence” → “we’re confident (for now).”

If you want, I can generate:

* the C# POCOs/EF mappings,
* a minimal Controller set,
* Angular components (card + queue table),
* and seed data + an evaluator that computes `confidence_now` and `triage_score` from config.
Cool, let’s turn those two sketchy heuristics into something you can actually ship and iterate on in Stella Ops.

I’ll go deeper on:

1. Decaying confidence as a proper first‑class concept
2. The triage queue and workflow around “unknowns”
3. A lightweight “unknown budget” / guardrail layer
4. Concrete implementation sketches (data models, formulas, pseudo‑code)
5. How this feeds planning & metrics

---

## 1) Decaying Confidence: From Idea to Mechanism

### 1.1 What “confidence” actually means

To keep semantics crisp, define **confidence** as:

> “How fresh and well‑supported our knowledge is about this vulnerability in this subject along this dimension (reachability, exploitability, etc.).”

* `1.0` = Recently assessed with strong evidence
* `0.0` = We basically haven’t looked / our info is ancient

This works for **unknown**, **known‑affected**, and **known‑not‑affected**; decay is about **knowledge freshness**, not the verdict itself.

For unknowns, confidence will usually be low and decaying → that’s what pushes them into the queue.

### 1.2 Data model v2 (UnknownsRegistry)

Extend the earlier object a bit:

```jsonc
{
  "unknown_id": "URN:unknown:pkg/npm/lodash:4.17.21:CVE-2021-23337:reachability",

  "subject_ref": {
    "type": "package",        // package | service | container | host | cluster
    "purl": "pkg:npm/lodash@4.17.21",
    "service_id": "checkout-api",
    "env": "prod"             // prod | staging | dev
  },

  "vuln_id": "CVE-2021-23337",
  "dimension": "reachability", // reachability | exploitability | fix_validity | other

  "state": "unknown",          // unknown | known_affected | known_not_affected | ignored
  "unknown_cause": "tooling_gap", // data_missing | vendor_silent | tooling_gap | conflicting_evidence

  "confidence": {
    "value": 0.62,             // computed on read
    "method": "half_life",
    "t0": "2025-11-29T12:00:00Z",
    "value_at_t0": 0.9,
    "half_life_days": 14,
    "threshold_low": 0.35,
    "threshold_high": 0.75
  },

  "impact": {
    "runtime_presence": true,
    "internet_exposed": true,
    "privilege_level": "high",
    "data_sensitivity": "pii", // none | internal | pii | financial
    "fleet_prevalence": 0.62,  // fraction of services using this
    "sla_tier": "gold"         // bronze | silver | gold
  },

  "next_review_at": "2025-12-06T12:00:00Z",
  "owner": "team-checkout",
  "created_at": "2025-11-29T12:00:00Z",
  "updated_at": "2025-11-29T12:00:00Z",

  "evidence": [
    {
      "kind": "static_scan_hint",
      "summary": "No direct call from public handler to vulnerable sink found.",
      "created_at": "2025-11-29T12:00:00Z",
      "link": "https://stellaops/ui/evidence/123"
    }
  ]
}
```

Key points:

* `unknown_cause` helps you slice unknowns by “why do we not know?” (lack of data vs tooling vs vendor).
* `impact` is embedded here so triage scoring can be local without joining a ton of tables.
* `half_life_days` can be **per dimension & per environment**, e.g.:

  * prod + reachability → 7 days
  * staging + fix_validity → 30 days

### 1.3 Decay math & scheduling

Use exponential decay:

```text
confidence(t) = value_at_t0 * 0.5^(Δdays / half_life_days)
```

Where:

* `Δdays = (now - t0) in days`

On write (when you update or create the record), you:

1. Compute `value_now` from any previous state.
2. Apply bump/delta based on new evidence (bounded by 0..1).
3. Set `value_at_t0 = value_now_after_bump`, `t0 = now`.
4. Precompute `next_review_at` = when `confidence(t)` will cross `threshold_low`.

Pseudo‑code for step 4:

```csharp
double DaysUntilThreshold(double valueAtT0, double threshold, double halfLifeDays)
{
    if (valueAtT0 <= threshold) return 0;

    // threshold = valueAtT0 * 0.5^(Δ/halfLife)
    // Δ = halfLife * log(threshold/valueAtT0) / log(0.5)
    return halfLifeDays * Math.Log(threshold / valueAtT0) / Math.Log(0.5);
}
```

Then:

```csharp
var days = DaysUntilThreshold(valueAtT0, thresholdLow, halfLifeDays);
nextReviewAt = now.AddDays(days);
```

**Important:** this gives you a **cheap query** to build the queue:

```sql
SELECT * FROM UnknownsRegistry
WHERE state = 'unknown'
  AND next_review_at <= now();
```

No cron‑based bulk recomputation necessary.

### 1.4 Events that bump confidence

Any new evidence should “refresh” knowledge and adjust confidence:

Examples:

* **Runtime traces showing the vulnerable function never called in a hot path**
  → bump reachability confidence up moderately (e.g. +0.2, capped at 0.9).

* **Symbolic or fuzzing probe explicitly drives execution into the vulnerable code**
  → flip `state = known_affected`, set confidence close to 1.0 with longer half‑life.

* **Vendor VEX: NOT AFFECTED**
  → flip `state = known_not_affected`, long half‑life (60–90 days), high confidence.

* **New major release, infra changes, or new internet exposure**
  → degrade confidence (e.g. −0.3) because architecture changed.

Implement this as a simple rules table:

```jsonc
{
  "on_evidence": [
    {
      "when": { "kind": "runtime_trace", "result": "no_calls_observed" },
      "dimension": "reachability",
      "delta_confidence": +0.2,
      "half_life_days": 14
    },
    {
      "when": { "kind": "runtime_trace", "result": "calls_into_vuln" },
      "dimension": "reachability",
      "set_state": "known_affected",
      "set_confidence": 0.95,
      "half_life_days": 21
    },
    {
      "when": { "kind": "vendor_vex", "result": "not_affected" },
      "set_state": "known_not_affected",
      "set_confidence": 0.98,
      "half_life_days": 60
    }
  ]
}
```

### 1.5 UI for decaying confidence

On the **Unknown Detail page**, you can show:

* **Confidence chip**:

  * “Knowledge freshness: 0.28 (stale)” with a color gradient.
* **Decay sparkline**: small chart showing confidence over the last 30 days.
* **Next review**: “Next review recommended by Dec 2, 2025 (in 3 days)”
* **Evidence stack**: timeline of evidence events with icons (static scan, runtime, vendor, etc.).
* **Actions area**: “Refresh now → Trigger runtime probe / request VEX / open Jira”.

All of that makes the heuristic feel concrete and gives engineers a mental model:
“this is decaying; here’s when we revisit; here’s how to add evidence.”

---

## 2) Triage Queue for High‑Impact Unknowns: Making It Useful

The goal: **reduce an ocean of unknowns to a small, actionable queue** that:

* Is **ranked by risk**, not noise
* Has clear **owners and due dates**
* Plugs cleanly into teams’ existing planning

### 2.1 Impact scoring, more formally

Define a normalized **impact score** `I` between 0 and 1:

```text
I = w_env * EnvExposure
  + w_data * DataSensitivity
  + w_prevalence * Prevalence
  + w_sla * SlaCriticality
  + w_cvss * CvssSeverity
```

Where each factor is also 0–1:

* `EnvExposure`:

  * prod + internet_exposed → 1.0
  * prod + internal only → 0.7
  * non‑prod → 0.3

* `DataSensitivity`:

  * none → 0.0, internal → 0.3, pii → 0.7, financial/health → 1.0

* `Prevalence`:

  * fraction of services/assets affected (0..1)

* `SlaCriticality`:

  * bronze → 0.3, silver → 0.6, gold → 1.0

* `CvssSeverity`:

  * use CVSS normalized to 0..1 if you have it, otherwise approximate from “critical/high/med/low”.

Weights `w_*` configurable, e.g.:

```text
w_env = 0.3
w_data = 0.25
w_prevalence = 0.15
w_sla = 0.15
w_cvss = 0.15
```

These can live in a tenant‑level config.

### 2.2 Triage score

You already had the core idea:

```text
triage_score = Impact * (1 - ConfidenceNow)
```

You can enrich this slightly with recency:

```text
RecencyBoost = min(1.2, 1.0 + DaysSinceCreated / 30 * 0.2)
triage_score = Impact * (1 - ConfidenceNow) * RecencyBoost
```

So very old unknowns with low confidence get a slight bump to avoid being buried forever.

### 2.3 Queue item lifecycle

Represent queue items as a simple workflow:

```jsonc
{
  "queue_item_id": "TRIAGE:unknown:…",
  "unknown_id": "URN:unknown:…",
  "triage_score": 0.81,
  "status": "open",          // open | in_progress | blocked | resolved | wont_fix
  "reason_blocked": null,
  "owner_team": "team-checkout",
  "assigned_to": "alice",
  "created_at": "2025-11-29T12:05:00Z",
  "due_by": "2025-12-02T17:00:00Z",

  "required_outcome": "add_evidence_or_verdict", // tasks that actually change state
  "suggested_actions": [
    { "type": "collect_runtime_trace", "cost": "low" },
    { "type": "symbolic_slice_probe", "cost": "medium" },
    { "type": "vendor_VEX_request", "cost": "low" }
  ],

  "audit": [
    {
      "at": "2025-11-29T12:05:00Z",
      "who": "system",
      "what": "enqueued: confidence below threshold_low; I=0.9, C=0.21"
    }
  ]
}
```

Rules:

* Queue item is (re)created automatically when unknown’s `next_review_at <= now` **and** impact above a minimum threshold.
* When an engineer **adds evidence** or changes `state` on the underlying unknown, the system:

  * Recomputes confidence, impact, triage_score
  * Closes the queue item if confidence now > `threshold_high` or state != unknown
* You can allow **re‑open** if it decays again later.

### 2.4 Queue UI & ops hooks

In UI, the **“High‑Impact Unknowns”** view shows:

Columns:

* Unknown (vuln + subject)
* State (always “unknown” here, but future‑proof)
* Impact badge (Low/Med/High/Critical)
* Confidence chip
* Triage score (sortable)
* Owner team
* Due by
* Quick actions

Interactions:

* Default filter: `impact >= High` AND `env = prod`
* Per‑team view: filter owner_team = “team‑X”
* Bulk ops: “Assign top 10 to me”, “Open Jira for selected” etc.

Ops hooks:

* **Daily digest** to each team: “You have 5 high‑impact unknowns due this week.”
* **Planning export**: per sprint, each team looks at “Top N unknowns by triage_score” and picks some into the sprint.
* **SLO integration**: if team’s “unknown budget” (see below) is overrun, they must schedule unknown work.

### 2.5 Example: one unknown from signal to closure

1. New CVE hits; SBOM says `checkout-api` uses affected library.

   * Unknown created with:

     * Impact ≈ 0.9 (prod, internet, PII, critical CVE)
     * Confidence = 0.4 (all we know is “it exists”).
   * `triage_score ≈ 0.9 * (1 - 0.4) = 0.54` → high enough to enqueue.

2. Engineer collects runtime trace, sees no calls to vulnerable path under normal traffic.

   * Evidence added, confidence bumped to 0.75, half‑life 14 days.
   * Queue item auto‑resolves if your `threshold_high` is 0.7.

3. Two months later, architecture changes, service gets a new public endpoint.

   * Deployment event triggers an automatic “degrade confidence” rule (−0.2), sets new `t0` and shorter half‑life.
   * `next_review_at` moves closer; unknown re‑enters queue later.

This gives you **continuously updating risk** without manual spreadsheets.

---

## 3) Unknown Budget & Guardrails (Optional but Powerful)

To connect this to leadership/SRE conversations, define an **“unknown budget”** per service/team:

> A target maximum risk mass of unknowns we’re willing to tolerate.

### 3.1 Per‑unknown “risk units”

For each unknown, define:

```text
risk_units = Impact * (1 - ConfidenceNow)
```

(It’s literally the triage score, but aggregated differently.)

Per team or service:

```text
unknown_risk_budget = sum(risk_units for that team/service)
```

You can then set **guardrails**, e.g.:

* Gold‑tier service: budget ≤ 5.0
* Silver: ≤ 15.0
* Bronze: ≤ 30.0

### 3.2 Guardrail behaviors

If a team exceeds its budget:

* Show warnings in the Stella Ops UI on the service details page.
* Add a banner in the high‑impact queue: “Unknown budget exceeded by 3.2 units.”
* Optional: feed into deployment checks:

  * Above 2× budget → require security approval before prod deploy.
  * Above 1× budget → must plan unknown work in next sprint.

This ties the heuristics to behavior change without being draconian.

---

## 4) Implementation Sketch (API & Code)

### 4.1 C# model sketch

```csharp
public enum UnknownState { Unknown, KnownAffected, KnownNotAffected, Ignored }

public sealed class UnknownRecord
{
    public string Id { get; set; } = default!;

    public string SubjectType { get; set; } = default!; // "package", "service", ...
    public string? Purl { get; set; }
    public string? ServiceId { get; set; }
    public string Env { get; set; } = "prod";

    public string? VulnId { get; set; }  // CVE, GHSA, etc.
    public string Dimension { get; set; } = "reachability";
    public UnknownState State { get; set; } = UnknownState.Unknown;
    public string UnknownCause { get; set; } = "data_missing";

    // Confidence fields persisted
    public double ConfidenceValueAtT0 { get; set; }
    public DateTimeOffset ConfidenceT0 { get; set; }
    public double HalfLifeDays { get; set; }
    public double ThresholdLow { get; set; }
    public double ThresholdHigh { get; set; }
    public DateTimeOffset NextReviewAt { get; set; }

    // Impact factors
    public bool RuntimePresence { get; set; }
    public bool InternetExposed { get; set; }
    public string PrivilegeLevel { get; set; } = "low";
    public string DataSensitivity { get; set; } = "none";
    public double FleetPrevalence { get; set; }
    public string SlaTier { get; set; } = "bronze";

    // Ownership & audit
    public string OwnerTeam { get; set; } = default!;
    public DateTimeOffset CreatedAt { get; set; }
    public DateTimeOffset UpdatedAt { get; set; }
}
```

Helper to compute `ConfidenceNow`:

```csharp
public static class ConfidenceCalculator
{
    public static double ComputeNow(UnknownRecord r, DateTimeOffset now)
    {
        var deltaDays = (now - r.ConfidenceT0).TotalDays;
        if (deltaDays <= 0) return Clamp01(r.ConfidenceValueAtT0);

        var factor = Math.Pow(0.5, deltaDays / r.HalfLifeDays);
        return Clamp01(r.ConfidenceValueAtT0 * factor);
    }

    public static (double valueAtT0, DateTimeOffset t0, DateTimeOffset nextReviewAt)
        ApplyEvidence(UnknownRecord r, double deltaConfidence, double? newHalfLifeDays, DateTimeOffset now)
    {
        var current = ComputeNow(r, now);
        var updated = Clamp01(current + deltaConfidence);
        var halfLife = newHalfLifeDays ?? r.HalfLifeDays;

        var daysToThreshold = DaysUntilThreshold(updated, r.ThresholdLow, halfLife);
        var nextReview = now.AddDays(daysToThreshold);

        return (updated, now, nextReview);
    }

    private static double DaysUntilThreshold(double valueAtT0, double threshold, double halfLifeDays)
    {
        if (valueAtT0 <= threshold) return 0;
        return halfLifeDays * Math.Log(threshold / valueAtT0) / Math.Log(0.5);
    }

    private static double Clamp01(double v) => v < 0 ? 0 : (v > 1 ? 1 : v);
}
```

Queue scoring:

```csharp
public sealed class TriageConfig
{
    public double WEnv { get; set; } = 0.3;
    public double WData { get; set; } = 0.25;
    public double WPrev { get; set; } = 0.15;
    public double WSla { get; set; } = 0.15;
    public double WCvss { get; set; } = 0.15;

    public double MinImpactForQueue { get; set; } = 0.4;
    public double MaxRecencyBoost { get; set; } = 1.2;
}

public static class TriageScorer
{
    public static double ComputeImpact(UnknownRecord r, double cvssNorm, TriageConfig cfg)
    {
        var env = r.Env == "prod"
            ? (r.InternetExposed ? 1.0 : 0.7)
            : 0.3;

        var data = r.DataSensitivity switch
        {
            "none" => 0.0,
            "internal" => 0.3,
            "pii" => 0.7,
            "financial" => 1.0,
            _ => 0.3
        };

        var sla = r.SlaTier switch
        {
            "bronze" => 0.3,
            "silver" => 0.6,
            "gold" => 1.0,
            _ => 0.3
        };

        var prev = Math.Max(0, Math.Min(1, r.FleetPrevalence));

        return cfg.WEnv * env
             + cfg.WData * data
             + cfg.WPrev * prev
             + cfg.WSla * sla
             + cfg.WCvss * cvssNorm;
    }

    public static double ComputeTriageScore(
        UnknownRecord r,
        double cvssNorm,
        DateTimeOffset now,
        DateTimeOffset createdAt,
        TriageConfig cfg)
    {
        var impact = ComputeImpact(r, cvssNorm, cfg);
        var confidence = ConfidenceCalculator.ComputeNow(r, now);

        if (impact < cfg.MinImpactForQueue) return 0;

        var ageDays = (now - createdAt).TotalDays;
        var recencyBoost = Math.Min(cfg.MaxRecencyBoost, 1.0 + (ageDays / 30.0) * 0.2);

        return impact * (1 - confidence) * recencyBoost;
    }
}
```

This is all straightforward to wire into your existing C#/Angular stack.

---

## 5) How This Feeds Planning & Metrics

Once this is live, you get a bunch of useful knobs for product and leadership:

### 5.1 Per‑team dashboards

For each team/service, show:

* **Unknown count** (total & by dimension)
* **Unknown risk budget** (current vs target)
* **Distribution of confidence** (e.g., histogram buckets: 0–0.25, 0.25–0.5, etc.)
* **Average age of unknowns**
* **Queue throughput**:

  * # of unknowns investigated this sprint
  * Average time from `enqueued → evidence added/ verdict`

These tell you if teams are actually burning down epistemic risk or just tagging things.

### 5.2 Process metrics to tune heuristics

Every quarter, look at:

* How many unknowns **re‑enter the queue** because decay hits threshold?
* For unknowns that later become **known‑affected incidents**, what were their triage scores?

  * If many “incident‑causing unknowns” had low triage scores, adjust weights.
* Are teams routinely ignoring certain impact factors (e.g., low data sensitivity)?

  * Maybe reduce weight or adjust scoring.

Because the heuristics are **explicit and simple**, you can iterate: tweak half‑lives and weights, observe effect on queue size and incident correlation.

---

If you’d like, next step I can sketch:

* A REST API surface (`GET /unknowns`, `GET /unknowns/triage`, `POST /unknowns/{id}/evidence`)
* Or specific Angular components for the **Confidence Decay Card** and **High‑Impact Unknowns** table, wired to these models.