up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
This commit is contained in:
@@ -31,7 +31,7 @@
|
||||
| 9 | ORCH-SVC-34-001 | DONE | Depends on 33-004. | Orchestrator Service Guild | Quota management APIs, per-tenant SLO burn-rate computation, alert budget tracking via metrics. |
|
||||
| 10 | ORCH-SVC-34-002 | DONE | Depends on 34-001. | Orchestrator Service Guild | Audit log + immutable run ledger export with signed manifest and provenance chain to artifacts. |
|
||||
| 11 | ORCH-SVC-34-003 | DONE | Depends on 34-002. | Orchestrator Service Guild | Perf/scale validation (≥10k pending jobs, dispatch P95 <150 ms); autoscaling hooks; health probes. |
|
||||
| 12 | ORCH-SVC-34-004 | TODO | Depends on 34-003. | Orchestrator Service Guild | GA packaging: container image, Helm overlays, offline bundle seeds, provenance attestations, compliance checklist. |
|
||||
| 12 | ORCH-SVC-34-004 | DONE | Depends on 34-003. | Orchestrator Service Guild | GA packaging: container image, Helm overlays, offline bundle seeds, provenance attestations, compliance checklist. |
|
||||
| 13 | ORCH-SVC-35-101 | TODO | Depends on 34-004. | Orchestrator Service Guild | Register `export` job type with quotas/rate policies; expose telemetry; ensure exporter workers heartbeat via orchestrator contracts. |
|
||||
| 14 | ORCH-SVC-36-101 | TODO | Depends on 35-101. | Orchestrator Service Guild | Capture distribution metadata and retention timestamps for export jobs; update dashboards and SSE payloads. |
|
||||
| 15 | ORCH-SVC-37-101 | TODO | Depends on 36-101. | Orchestrator Service Guild | Enable scheduled export runs, retention pruning hooks, failure alerting tied to export job class. |
|
||||
@@ -53,6 +53,7 @@
|
||||
| 2025-11-28 | ORCH-SVC-34-001 DONE: Implemented quota management APIs with SLO burn-rate computation and alert budget tracking. Created Slo domain model (Domain/Slo.cs) with SloType enum (Availability/Latency/Throughput), SloWindow enum (1h/1d/7d/30d), AlertSeverity enum, factory methods (CreateAvailability/CreateLatency/CreateThroughput), Update/Enable/Disable methods, ErrorBudget/GetWindowDuration computed properties. Created SloState record for current metrics (SLI, budget consumed/remaining, burn rate, time to exhaustion). Created AlertBudgetThreshold (threshold-based alerting with cooldown and rate limiting, ShouldTrigger logic). Created SloAlert (alert lifecycle with Acknowledge/Resolve). Built BurnRateEngine (SloManagement/BurnRateEngine.cs) with interfaces: IBurnRateEngine (ComputeStateAsync, ComputeAllStatesAsync, EvaluateAlertsAsync), ISloEventSource (availability/latency/throughput counts retrieval), ISloRepository/IAlertThresholdRepository/ISloAlertRepository. Created database migration (004_slo_quotas.sql) with tables: slos, alert_budget_thresholds, slo_alerts, slo_state_snapshots, quota_audit_log, job_metrics_hourly. Added helper functions: get_slo_availability_counts, cleanup_slo_snapshots, cleanup_quota_audit_log, get_slo_summary. Created REST API contracts (QuotaContracts.cs): CreateQuotaRequest/UpdateQuotaRequest/PauseQuotaRequest/QuotaResponse/QuotaListResponse, CreateSloRequest/UpdateSloRequest/SloResponse/SloListResponse/SloStateResponse/SloWithStateResponse, CreateAlertThresholdRequest/AlertThresholdResponse, SloAlertResponse/SloAlertListResponse/AcknowledgeAlertRequest/ResolveAlertRequest, SloSummaryResponse/QuotaSummaryResponse/QuotaUtilizationResponse. Created QuotaEndpoints (list/get/create/update/delete, pause/resume, summary). Created SloEndpoints (list/get/create/update/delete, enable/disable, state/states, thresholds CRUD, alerts list/get/acknowledge/resolve, summary). Added SLO metrics to OrchestratorMetrics: SlosCreated/SlosUpdated, SloAlertsTriggered/Acknowledged/Resolved, SloBudgetConsumed/SloBurnRate/SloCurrentSli/SloBudgetRemaining/SloTimeToExhaustion histograms, SloActiveAlerts UpDownCounter. Comprehensive test coverage: SloTests (25 tests for creation/validation/error budget/window duration/update/enable-disable), SloStateTests (tests for NoData factory), AlertBudgetThresholdTests (12 tests for creation/validation/ShouldTrigger/cooldown), SloAlertTests (5 tests for Create/Acknowledge/Resolve). Build succeeds, 450 tests pass (+48 new tests). | Implementer |
|
||||
| 2025-11-28 | ORCH-SVC-34-002 DONE: Implemented audit log and immutable run ledger export. Created AuditLog domain model (Domain/Audit/AuditLog.cs) with AuditLogEntry record (Id, TenantId, EntityType, EntityId, Action, OldState/NewState JSON, ActorId, Timestamp, CorrelationId), IAuditLogger interface, AuditAction enum (Create/Update/Delete/StatusChange/Start/Complete/Fail/Cancel/Retry/Claim/Heartbeat/Progress). Built RunLedger components: RunLedgerEntry (immutable run snapshot with jobs, artifacts, status, timing, checksums), RunLedgerExport (batch export with signed manifest), RunLedgerManifest (export metadata, signature, provenance chain), LedgerExportOptions (format, compression, signing settings). Created IAuditLogRepository/IRunLedgerRepository interfaces. Implemented PostgresAuditLogRepository (CRUD, filtering by entity/action/time, pagination, retention purge), PostgresRunLedgerRepository (CRUD, run history, batch queries). Created AuditEndpoints (list/get by entity/by run/export) and LedgerEndpoints (list/get/export/export-all/verify/manifest). Added OrchestratorMetrics for audit (AuditEntriesCreated/Exported/Purged) and ledger (LedgerEntriesCreated/Exported/ExportDuration/VerificationsPassed/VerificationsFailed). Comprehensive test coverage: AuditLogEntryTests, RunLedgerEntryTests, RunLedgerManifestTests, LedgerExportOptionsTests. Build succeeds, 487 tests pass (+37 new tests). | Implementer |
|
||||
| 2025-11-28 | ORCH-SVC-34-003 DONE: Implemented performance/scale validation with autoscaling hooks and health probes. Created ScaleMetrics service (Core/Scale/ScaleMetrics.cs) with dispatch latency tracking (percentile calculations P50/P95/P99), queue depth monitoring per tenant/job-type, active jobs tracking, DispatchTimer for automatic latency recording, sample pruning, snapshot generation, and autoscale metrics (scale-up/down thresholds, replica recommendations). Built LoadShedder (Core/Scale/LoadShedder.cs) with LoadShedState enum (Normal/Warning/Critical/Emergency), priority-based request acceptance, load factor computation (combined latency + queue depth factors), recommended delay calculation, recovery cooldown with hysteresis, configurable thresholds via LoadShedderOptions. Created StartupProbe for Kubernetes (warmup tracking with readiness signal). Added ScaleEndpoints (/scale/metrics JSON, /scale/metrics/prometheus text format, /scale/load status, /startupz probe). Enhanced HealthEndpoints integration. Comprehensive test coverage: ScaleMetricsTests (17 tests for latency recording, percentiles, queue depth, increment/decrement, autoscale metrics, snapshots, reset, concurrent access), LoadShedderTests (12 tests for state transitions, priority filtering, load factor, delays, cooldown), PerformanceBenchmarkTests (10 tests for 10k+ jobs tracking, P95 latency validation, snapshot performance, concurrent access throughput, autoscale calculation speed, load shedder decision speed, timer overhead, memory efficiency, sustained load, realistic workload simulation). Build succeeds, 37 scale tests pass (487 total). | Implementer |
|
||||
| 2025-11-29 | ORCH-SVC-34-004 DONE: Implemented GA packaging artifacts. Created multi-stage Dockerfile (ops/orchestrator/Dockerfile) with SDK build stage and separate runtime stages for orchestrator-web and orchestrator-worker, including OCI labels, HEALTHCHECK directive, and deterministic build settings. Created Helm values overlay (deploy/helm/stellaops/values-orchestrator.yaml) with orchestrator-web (2 replicas), orchestrator-worker (1 replica), and orchestrator-postgres services, including full configuration for scheduler, autoscaling, load shedding, dead letter, and backfill. Created air-gap bundle script (ops/orchestrator/build-airgap-bundle.sh) for offline deployment with OCI image export, config templates, manifest generation, and documentation bundling. Created SLSA v1 provenance attestation template (ops/orchestrator/provenance.json) with build definition, resolved dependencies, and byproducts. Created GA compliance checklist (ops/orchestrator/GA_CHECKLIST.md) covering build/packaging, security, functional, performance/scale, observability, deployment, documentation, testing, and compliance sections with sign-off template. All YAML/JSON syntax validated, build succeeds. | Implementer |
|
||||
|
||||
## Decisions & Risks
|
||||
- All tasks depend on outputs from Orchestrator I (32-001); sprint remains TODO until upstream ship.
|
||||
|
||||
@@ -80,4 +80,5 @@
|
||||
| 2025-11-28 | CVSS-RECEIPT-190-005 DONE: Added `ReceiptBuilder` with deterministic input hashing, evidence validation (policy-driven), vector/scoring via CvssV4Engine, and persistence through repository abstraction. Added `CreateReceiptRequest`, `IReceiptRepository`, unit tests (`ReceiptBuilderTests`) with in-memory repo; all 37 tests passing. | Implementer |
|
||||
| 2025-11-28 | CVSS-DSSE-190-006 DONE: Integrated Attestor DSSE signing into receipt builder. Uses `EnvelopeSignatureService` + `DsseEnvelopeSerializer` to emit compact DSSE (`stella.ops/cvssReceipt@v1`) and stores base64 DSSE ref in `AttestationRefs`. Added signing test with Ed25519 fixture; total tests 38 passing. | Implementer |
|
||||
| 2025-11-28 | CVSS-HISTORY-190-007 DONE: Added `ReceiptHistoryService` with amendment tracking (`AmendReceiptRequest`), history entry creation, modified metadata, and optional DSSE re-signing. Repository abstraction extended with `GetAsync`/`UpdateAsync`; in-memory repo updated; tests remain green (38). | Implementer |
|
||||
| 2025-11-29 | CVSS-RECEIPT/DSSE/HISTORY tasks wired to PostgreSQL: added `policy.cvss_receipts` migration, `PostgresReceiptRepository`, DI registration, and integration test (`PostgresReceiptRepositoryTests`). Test run failed locally because Docker/Testcontainers not available; code compiles and unit tests still pass. | Implementer |
|
||||
| 2025-11-28 | Ran `dotnet test src/Policy/__Tests/StellaOps.Policy.Scoring.Tests` (Release); 35 tests passed. Adjusted MacroVector lookup for FIRST sample vectors; duplicate PackageReference warnings remain to be cleaned separately. | Implementer |
|
||||
|
||||
@@ -0,0 +1,602 @@
|
||||
Here’s a simple, low‑friction way to keep priorities fresh without constant manual grooming: **let confidence decay over time**.
|
||||
|
||||
%20=%20e^{-t/τ})
|
||||
|
||||
# Exponential confidence decay (what & why)
|
||||
|
||||
* **Idea:** Every item (task, lead, bug, doc, hypothesis) has a confidence score that **automatically shrinks with time** if you don’t touch it.
|
||||
* **Formula:** `confidence(t) = e^(−t/τ)` where `t` is days since last signal (edit, comment, commit, new data), and **τ (“tau”)** is the decay constant.
|
||||
* **Rule of thumb:** With **τ = 30 days**, at **t = 30** the confidence is **e^(−1) ≈ 0.37**—about a **63% drop**. This surfaces long‑ignored items *gradually*, not with harsh “stale/expired” flips.
|
||||
|
||||
# How to use it in practice
|
||||
|
||||
* **Signals that reset t → 0:** comment on the ticket, new benchmark, fresh log sample, doc update, CI run, new market news.
|
||||
* **Sort queues by:** `priority × confidence(t)` (or severity × confidence). Quiet items drift down; truly active ones stay up.
|
||||
* **Escalation bands:**
|
||||
|
||||
* `>0.6` = green (recently touched)
|
||||
* `0.3–0.6` = amber (review soon)
|
||||
* `<0.3` = red (poke or close)
|
||||
|
||||
# Quick presets
|
||||
|
||||
* **Fast‑moving queues (incidents, hot leads):** τ = **7–14** days
|
||||
* **Engineering tasks / product docs:** τ = **30** days
|
||||
* **Research bets / roadmaps:** τ = **60–90** days
|
||||
|
||||
# For your world (Stella Ops + ops/dev work)
|
||||
|
||||
* **Vuln tickets:** `risk_score = CVSS × reachability × e^(−t/30)`
|
||||
* **Roadmap epics:** `value_score = impact × e^(−t/60)` to re‑rank quarterly.
|
||||
* **Docs:** show a badge “freshness: 42%” derived from last edit age to nudge updates.
|
||||
|
||||
# Minimal implementation sketch
|
||||
|
||||
* Store per‑item: `last_signal_at`, `base_priority`.
|
||||
* Compute on read:
|
||||
|
||||
```
|
||||
days = (now - last_signal_at).days
|
||||
conf = exp(-days / tau)
|
||||
score = base_priority * conf
|
||||
```
|
||||
* Recompute in your API layer or materialize nightly; no cron spam needed.
|
||||
|
||||
If you want, I can draft a tiny C# helper (and SQL snippet) you can drop into your issue service to add `confidence(t)` and color bands to your lists.
|
||||
Perfect, let’s turn the idea into something your devs can actually build.
|
||||
|
||||
Below is an implementation plan you can drop into a ticketing/PRD — with clear phases, data model changes, APIs, and some sample code (C# + SQL). I’ll also sprinkle in Stella Ops–specific notes.
|
||||
|
||||
---
|
||||
|
||||
## 0. Scope & Objectives
|
||||
|
||||
**Goal:** Introduce `confidence(t)` as an automatic freshness factor that decays with time and is used to rank and highlight work.
|
||||
|
||||
We’ll apply it to:
|
||||
|
||||
* Vulnerabilities (Stella Ops)
|
||||
* General issues / tasks / epics
|
||||
* (Optional) Docs, leads, hypotheses later
|
||||
|
||||
**Core behavior:**
|
||||
|
||||
* Each item has:
|
||||
|
||||
* A base priority / risk (from severity, business impact, etc.)
|
||||
* A timestamp of last signal (meaningful activity)
|
||||
* A decay rate τ (tau) in days
|
||||
* Effective priority = `base_priority × confidence(t)`
|
||||
* `confidence(t) = exp(− t / τ)` where `t` = days since last_signal
|
||||
|
||||
---
|
||||
|
||||
## 1. Data Model Changes
|
||||
|
||||
### 1.1. Add fields to core “work item” tables
|
||||
|
||||
For each relevant table (`Issues`, `Vulnerabilities`, `Epics`, …):
|
||||
|
||||
**New columns:**
|
||||
|
||||
* `base_priority` (FLOAT or INT)
|
||||
|
||||
* Example: 1–100, or derived from severity.
|
||||
* `last_signal_at` (DATETIME, NOT NULL, default = `created_at`)
|
||||
* `tau_days` (FLOAT, nullable, falls back to type default)
|
||||
* (Optional) `confidence_score_cached` (FLOAT, for materialized score)
|
||||
* (Optional) `is_confidence_frozen` (BOOL, default FALSE)
|
||||
For pinned items that should not decay.
|
||||
|
||||
**Example Postgres migration (Issues):**
|
||||
|
||||
```sql
|
||||
ALTER TABLE issues
|
||||
ADD COLUMN base_priority DOUBLE PRECISION,
|
||||
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
ADD COLUMN tau_days DOUBLE PRECISION,
|
||||
ADD COLUMN confidence_cached DOUBLE PRECISION,
|
||||
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
```
|
||||
|
||||
For Stella Ops:
|
||||
|
||||
```sql
|
||||
ALTER TABLE vulnerabilities
|
||||
ADD COLUMN base_risk DOUBLE PRECISION,
|
||||
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
ADD COLUMN tau_days DOUBLE PRECISION,
|
||||
ADD COLUMN confidence_cached DOUBLE PRECISION,
|
||||
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
```
|
||||
|
||||
### 1.2. Add a config table for τ per entity type
|
||||
|
||||
```sql
|
||||
CREATE TABLE confidence_decay_config (
|
||||
id SERIAL PRIMARY KEY,
|
||||
entity_type TEXT NOT NULL, -- 'issue', 'vulnerability', 'epic', 'doc'
|
||||
tau_days_default DOUBLE PRECISION NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
INSERT INTO confidence_decay_config (entity_type, tau_days_default) VALUES
|
||||
('incident', 7),
|
||||
('vulnerability', 30),
|
||||
('issue', 30),
|
||||
('epic', 60),
|
||||
('doc', 90);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Define “signal” events & instrumentation
|
||||
|
||||
We need a standardized way to say: “this item got activity → reset last_signal_at”.
|
||||
|
||||
### 2.1. Signals that should reset `last_signal_at`
|
||||
|
||||
For **issues / epics:**
|
||||
|
||||
* New comment
|
||||
* Status change (e.g., Open → In Progress)
|
||||
* Field change that matters (severity, owner, milestone)
|
||||
* Attachment added
|
||||
* Link to PR added or updated
|
||||
* New CI failure linked
|
||||
|
||||
For **vulnerabilities (Stella Ops):**
|
||||
|
||||
* New scanner result attached or status updated (e.g., “Verified”, “False Positive”)
|
||||
* New evidence (PoC, exploit notes)
|
||||
* SLA override change
|
||||
* Assignment / ownership change
|
||||
* Integration events (e.g., PR merge that references the vuln)
|
||||
|
||||
For **docs (if you do it):**
|
||||
|
||||
* Any edit
|
||||
* Comment/annotation
|
||||
|
||||
### 2.2. Implement a shared helper to record a signal
|
||||
|
||||
**Service-level helper (pseudocode / C#-ish):**
|
||||
|
||||
```csharp
|
||||
public interface IConfidenceSignalService
|
||||
{
|
||||
Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null);
|
||||
}
|
||||
|
||||
public class ConfidenceSignalService : IConfidenceSignalService
|
||||
{
|
||||
private readonly IWorkItemRepository _repo;
|
||||
private readonly IConfidenceConfigService _config;
|
||||
|
||||
public async Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null)
|
||||
{
|
||||
var now = signalTimeUtc ?? DateTime.UtcNow;
|
||||
var item = await _repo.GetByIdAsync(type, itemId);
|
||||
if (item == null) return;
|
||||
|
||||
item.LastSignalAt = now;
|
||||
|
||||
if (item.TauDays == null)
|
||||
{
|
||||
item.TauDays = await _config.GetDefaultTauAsync(type);
|
||||
}
|
||||
|
||||
await _repo.UpdateAsync(item);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3. Wire signals into existing flows
|
||||
|
||||
Create small tasks for devs like:
|
||||
|
||||
* **ISS-01:** Call `RecordSignalAsync` on:
|
||||
|
||||
* New issue comment handler
|
||||
* Issue status update handler
|
||||
* Issue field update handler (severity/priority/owner)
|
||||
* **VULN-01:** Call `RecordSignalAsync` when:
|
||||
|
||||
* New scanner result ingested for a vuln
|
||||
* Vulnerability status, SLA, or owner changes
|
||||
* New exploit evidence is attached
|
||||
|
||||
---
|
||||
|
||||
## 3. Confidence & scoring calculation
|
||||
|
||||
### 3.1. Shared confidence function
|
||||
|
||||
Definition:
|
||||
|
||||
```csharp
|
||||
public static class ConfidenceMath
|
||||
{
|
||||
// t = days since last signal
|
||||
public static double ConfidenceScore(DateTime lastSignalAtUtc, double tauDays, DateTime? nowUtc = null)
|
||||
{
|
||||
var now = nowUtc ?? DateTime.UtcNow;
|
||||
var tDays = (now - lastSignalAtUtc).TotalDays;
|
||||
|
||||
if (tDays <= 0) return 1.0;
|
||||
if (tauDays <= 0) return 1.0; // guard / fallback
|
||||
|
||||
var score = Math.Exp(-tDays / tauDays);
|
||||
|
||||
// Optional: never drop below a tiny floor, so items never "disappear"
|
||||
const double floor = 0.01;
|
||||
return Math.Max(score, floor);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2. Effective priority formulas
|
||||
|
||||
**Generic issues / tasks:**
|
||||
|
||||
```csharp
|
||||
double effectiveScore = issue.BasePriority * ConfidenceMath.ConfidenceScore(issue.LastSignalAt, issue.TauDays ?? defaultTau);
|
||||
```
|
||||
|
||||
**Vulnerabilities (Stella Ops):**
|
||||
|
||||
Let’s define:
|
||||
|
||||
* `severity_weight`: map CVSS or severity string to numeric (e.g. Critical=100, High=80, Medium=50, Low=20).
|
||||
* `reachability`: 0–1 (e.g. from your reachability analysis).
|
||||
* `exploitability`: 0–1 (optional, based on known exploits).
|
||||
* `confidence`: as above.
|
||||
|
||||
```csharp
|
||||
double baseRisk = severityWeight * reachability * exploitability; // or simpler: severityWeight * reachability
|
||||
double conf = ConfidenceMath.ConfidenceScore(vuln.LastSignalAt, vuln.TauDays ?? defaultTau);
|
||||
double effectiveRisk = baseRisk * conf;
|
||||
```
|
||||
|
||||
Store `baseRisk` → `vulnerabilities.base_risk`, and compute `effectiveRisk` on the fly or via job.
|
||||
|
||||
### 3.3. SQL implementation (optional for server-side sorting)
|
||||
|
||||
**Postgres example:**
|
||||
|
||||
```sql
|
||||
-- t_days = age in days
|
||||
-- tau = tau_days
|
||||
-- score = exp(-t_days / tau)
|
||||
|
||||
SELECT
|
||||
i.*,
|
||||
i.base_priority *
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS effective_priority
|
||||
FROM issues i
|
||||
ORDER BY effective_priority DESC;
|
||||
```
|
||||
|
||||
You can wrap that in a view:
|
||||
|
||||
```sql
|
||||
CREATE VIEW issues_with_confidence AS
|
||||
SELECT
|
||||
i.*,
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS confidence,
|
||||
i.base_priority *
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS effective_priority
|
||||
FROM issues i;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Caching & performance
|
||||
|
||||
You have two options:
|
||||
|
||||
### 4.1. Compute on read (simplest to start)
|
||||
|
||||
* Use the helper function in your service layer or a DB view.
|
||||
* Pros:
|
||||
|
||||
* No jobs, always fresh.
|
||||
* Cons:
|
||||
|
||||
* Slight CPU cost on heavy lists.
|
||||
|
||||
**Plan:** Start with this. If you see perf issues, move to 4.2.
|
||||
|
||||
### 4.2. Periodic materialization job (optional later)
|
||||
|
||||
Add a scheduled job (e.g. hourly) that:
|
||||
|
||||
1. Selects all active items.
|
||||
2. Computes `confidence_score` and `effective_priority`.
|
||||
3. Writes to `confidence_cached` and `effective_priority_cached` (if you add such a column).
|
||||
|
||||
Service then sorts by cached values.
|
||||
|
||||
---
|
||||
|
||||
## 5. Backfill & migration
|
||||
|
||||
### 5.1. Initial backfill script
|
||||
|
||||
For existing records:
|
||||
|
||||
* If `last_signal_at` is NULL → set to `created_at`.
|
||||
* Derive `base_priority` / `base_risk` from existing severity fields.
|
||||
* Set `tau_days` from config.
|
||||
|
||||
**Example:**
|
||||
|
||||
```sql
|
||||
UPDATE issues
|
||||
SET last_signal_at = created_at
|
||||
WHERE last_signal_at IS NULL;
|
||||
|
||||
UPDATE issues
|
||||
SET base_priority = CASE severity
|
||||
WHEN 'critical' THEN 100
|
||||
WHEN 'high' THEN 80
|
||||
WHEN 'medium' THEN 50
|
||||
WHEN 'low' THEN 20
|
||||
ELSE 10
|
||||
END
|
||||
WHERE base_priority IS NULL;
|
||||
|
||||
UPDATE issues i
|
||||
SET tau_days = c.tau_days_default
|
||||
FROM confidence_decay_config c
|
||||
WHERE c.entity_type = 'issue'
|
||||
AND i.tau_days IS NULL;
|
||||
```
|
||||
|
||||
Do similarly for `vulnerabilities` using severity / CVSS.
|
||||
|
||||
### 5.2. Sanity checks
|
||||
|
||||
Add a small script/test to verify:
|
||||
|
||||
* Newly created items → `confidence ≈ 1.0`.
|
||||
* 30-day-old items with τ=30 → `confidence ≈ 0.37`.
|
||||
* Ordering changes when you edit/comment on items.
|
||||
|
||||
---
|
||||
|
||||
## 6. API & Query Layer
|
||||
|
||||
### 6.1. New sorting options
|
||||
|
||||
Update list APIs:
|
||||
|
||||
* Accept parameter: `sort=effective_priority` or `sort=confidence`.
|
||||
* Default sort for some views:
|
||||
|
||||
* Vulnerabilities backlog: `sort=effective_risk` (risk × confidence).
|
||||
* Issues backlog: `sort=effective_priority`.
|
||||
|
||||
**Example REST API contract:**
|
||||
|
||||
`GET /api/issues?sort=effective_priority&state=open`
|
||||
|
||||
**Response fields (additions):**
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "ISS-123",
|
||||
"title": "Fix login bug",
|
||||
"base_priority": 80,
|
||||
"last_signal_at": "2025-11-01T10:00:00Z",
|
||||
"tau_days": 30,
|
||||
"confidence": 0.63,
|
||||
"effective_priority": 50.4,
|
||||
"confidence_band": "amber"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2. Confidence banding (for UI)
|
||||
|
||||
Define bands server-side (easy to change):
|
||||
|
||||
* Green: `confidence >= 0.6`
|
||||
* Amber: `0.3 ≤ confidence < 0.6`
|
||||
* Red: `confidence < 0.3`
|
||||
|
||||
You can compute on server:
|
||||
|
||||
```csharp
|
||||
string ConfidenceBand(double confidence) =>
|
||||
confidence >= 0.6 ? "green"
|
||||
: confidence >= 0.3 ? "amber"
|
||||
: "red";
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. UI / UX changes
|
||||
|
||||
### 7.1. List views (issues / vulns / epics)
|
||||
|
||||
For each item row:
|
||||
|
||||
* Show a small freshness pill:
|
||||
|
||||
* Text: `Active`, `Review soon`, `Stale`
|
||||
* Derived from confidence band.
|
||||
* Tooltip:
|
||||
|
||||
* “Confidence 78%. Last activity 3 days ago. τ = 30 days.”
|
||||
|
||||
* Sort default: by `effective_priority` / `effective_risk`.
|
||||
|
||||
* Filters:
|
||||
|
||||
* `Freshness: [All | Active | Review soon | Stale]`
|
||||
* Optionally: “Show stale only” toggle.
|
||||
|
||||
**Example labels:**
|
||||
|
||||
* Green: “Active (confidence 82%)”
|
||||
* Amber: “Review soon (confidence 45%)”
|
||||
* Red: “Stale (confidence 18%)”
|
||||
|
||||
### 7.2. Detail views
|
||||
|
||||
On an issue / vuln page:
|
||||
|
||||
* Add a “Confidence” section:
|
||||
|
||||
* “Confidence: **52%**”
|
||||
* “Last signal: **12 days ago**”
|
||||
* “Decay τ: **30 days**”
|
||||
* “Effective priority: **Base 80 × 0.52 = 42**”
|
||||
|
||||
* (Optional) small mini-chart (text-only or simple bar) showing approximate decay, but not necessary for first iteration.
|
||||
|
||||
### 7.3. Admin / settings UI
|
||||
|
||||
Add an internal settings page:
|
||||
|
||||
* Table of entity types with editable τ:
|
||||
|
||||
| Entity type | τ (days) | Notes |
|
||||
| ------------- | -------- | ---------------------------- |
|
||||
| Incident | 7 | Fast-moving |
|
||||
| Vulnerability | 30 | Standard risk review cadence |
|
||||
| Issue | 30 | Sprint-level decay |
|
||||
| Epic | 60 | Quarterly |
|
||||
| Doc | 90 | Slow decay |
|
||||
|
||||
* Optionally: toggle to pin item (`is_confidence_frozen`) from UI.
|
||||
|
||||
---
|
||||
|
||||
## 8. Stella Ops–specific behavior
|
||||
|
||||
For vulnerabilities:
|
||||
|
||||
### 8.1. Base risk calculation
|
||||
|
||||
Ingested fields you likely already have:
|
||||
|
||||
* `cvss_score` or `severity`
|
||||
* `reachable` (true/false or numeric)
|
||||
* (Optional) `exploit_available` (bool) or exploitability score
|
||||
* `asset_criticality` (1–5)
|
||||
|
||||
Define `base_risk` as:
|
||||
|
||||
```text
|
||||
severity_weight = f(cvss_score or severity)
|
||||
reachability = reachable ? 1.0 : 0.5 -- example
|
||||
exploitability = exploit_available ? 1.0 : 0.7
|
||||
asset_factor = 0.5 + 0.1 * asset_criticality -- 1 → 1.0, 5 → 1.5
|
||||
|
||||
base_risk = severity_weight * reachability * exploitability * asset_factor
|
||||
```
|
||||
|
||||
Store `base_risk` on vuln row.
|
||||
|
||||
Then:
|
||||
|
||||
```text
|
||||
effective_risk = base_risk * confidence(t)
|
||||
```
|
||||
|
||||
Use `effective_risk` for backlog ordering and SLAs dashboards.
|
||||
|
||||
### 8.2. Signals for vulns
|
||||
|
||||
Make sure these all call `RecordSignalAsync(Vulnerability, vulnId)`:
|
||||
|
||||
* New scan result for same vuln (re-detected).
|
||||
* Change status to “In Progress”, “Ready for Deploy”, “Verified Fixed”, etc.
|
||||
* Assigning an owner.
|
||||
* Attaching PoC / exploit details.
|
||||
|
||||
### 8.3. Vuln UI copy ideas
|
||||
|
||||
* Pill text:
|
||||
|
||||
* “Risk: 850 (confidence 68%)”
|
||||
* “Last analyst activity 11 days ago”
|
||||
|
||||
* In backlog view: show **Effective Risk** as main sort, with a smaller subtext “Base 1200 × Confidence 71%”.
|
||||
|
||||
---
|
||||
|
||||
## 9. Rollout plan
|
||||
|
||||
### Phase 1 – Infrastructure (backend-only)
|
||||
|
||||
* [ ] DB migrations & config table
|
||||
* [ ] Implement `ConfidenceMath` and helper functions
|
||||
* [ ] Implement `IConfidenceSignalService`
|
||||
* [ ] Wire signals into key flows (comments, state changes, scanner ingestion)
|
||||
* [ ] Add `confidence` and `effective_priority/risk` to API responses
|
||||
* [ ] Backfill script + dry run in staging
|
||||
|
||||
### Phase 2 – Internal UI & feature flag
|
||||
|
||||
* [ ] Add optional sorting by effective score to internal/staff views
|
||||
* [ ] Add confidence pill (hidden behind feature flag `confidence_decay_v1`)
|
||||
* [ ] Dogfood internally:
|
||||
|
||||
* Do items bubble up/down as expected?
|
||||
* Are any items “disappearing” because decay is too aggressive?
|
||||
|
||||
### Phase 3 – Parameter tuning
|
||||
|
||||
* [ ] Adjust τ per type based on feedback:
|
||||
|
||||
* If things decay too fast → increase τ
|
||||
* If queues rarely change → decrease τ
|
||||
* [ ] Decide on confidence floor (0.01? 0.05?) so nothing goes to literal 0.
|
||||
|
||||
### Phase 4 – General release
|
||||
|
||||
* [ ] Make effective score the default sort for key views:
|
||||
|
||||
* Vulnerabilities backlog
|
||||
* Issues backlog
|
||||
* [ ] Document behavior for users (help center / inline tooltip)
|
||||
* [ ] Add admin UI to tweak τ per entity type.
|
||||
|
||||
---
|
||||
|
||||
## 10. Edge cases & safeguards
|
||||
|
||||
* **New items**
|
||||
|
||||
* `last_signal_at = created_at`, confidence = 1.0.
|
||||
* **Pinned items**
|
||||
|
||||
* If `is_confidence_frozen = true` → treat confidence as 1.0.
|
||||
* **Items without τ**
|
||||
|
||||
* Always fallback to entity type default.
|
||||
* **Timezones**
|
||||
|
||||
* Always store & compute in UTC.
|
||||
* **Very old items**
|
||||
|
||||
* Floor the confidence so they’re still visible when explicitly searched.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can turn this into:
|
||||
|
||||
* A short **technical design doc** (with sections: Problem, Proposal, Alternatives, Rollout).
|
||||
* Or a **set of Jira tickets** grouped by backend / frontend / infra that your team can pick up directly.
|
||||
Reference in New Issue
Block a user