SPRINT_3600_0001_0001 - Reachability Drift Detection Master Plan

This commit is contained in:
2025-12-18 00:02:31 +02:00
parent 8bbfe4d2d2
commit dee252940b
13 changed files with 6099 additions and 1651 deletions

View File

@@ -0,0 +1,334 @@
# Stella Ops Triage API Contract v1
Base path: `/api/triage/v1`
This contract is served by `scanner.webservice` (or a dedicated triage facade that reads scanner-owned tables).
All risk/lattice outputs originate from `scanner.webservice`.
Key requirements:
- Deterministic outputs (policyId + policyVersion + inputsHash).
- Proof-linking (chips reference evidenceIds).
- `concelier` and `excititor` preserve prune source: API surfaces source chains via `sourceRefs`.
## 0. Conventions
### 0.1 Identifiers
- `caseId` == `findingId` (UUID). A case is a finding scoped to an asset/environment.
- Hashes are hex strings.
### 0.2 Caching
- GET endpoints SHOULD return `ETag`.
- Clients SHOULD send `If-None-Match`.
### 0.3 Errors
Standard error envelope:
```json
{
"error": {
"code": "string",
"message": "string",
"details": { "any": "json" },
"traceId": "string"
}
}
```
Common codes:
* `not_found`
* `validation_error`
* `conflict`
* `unauthorized`
* `forbidden`
* `rate_limited`
## 1. Findings Table
### 1.1 List findings
`GET /findings`
Query params:
* `showMuted` (bool, default false)
* `lane` (optional, enum)
* `search` (optional string; searches asset, purl, cveId)
* `page` (int, default 1)
* `pageSize` (int, default 50; max 200)
* `sort` (optional: `updatedAt`, `score`, `lane`)
* `order` (optional: `asc|desc`)
Response 200:
```json
{
"page": 1,
"pageSize": 50,
"total": 12345,
"mutedCounts": { "reach": 1904, "vex": 513, "compensated": 18 },
"rows": [
{
"id": "uuid",
"lane": "BLOCKED",
"verdict": "BLOCK",
"score": 87,
"reachable": "YES",
"vex": "affected",
"exploit": "YES",
"asset": "prod/api-gateway:1.2.3",
"updatedAt": "2025-12-16T01:02:03Z"
}
]
}
```
## 2. Case Narrative
### 2.1 Get case header
`GET /cases/{caseId}`
Response 200:
```json
{
"id": "uuid",
"verdict": "BLOCK",
"lane": "BLOCKED",
"score": 87,
"policyId": "prod-strict",
"policyVersion": "2025.12.14",
"inputsHash": "hex",
"why": "Reachable path observed; exploit signal present; prod-strict blocks.",
"chips": [
{ "key": "reachability", "label": "Reachability", "value": "Reachable (92%)", "evidenceIds": ["uuid"] },
{ "key": "vex", "label": "VEX", "value": "affected", "evidenceIds": ["uuid"] },
{ "key": "gate", "label": "Gate", "value": "BLOCKED by prod-strict", "evidenceIds": ["uuid"] }
],
"sourceRefs": [
{
"domain": "concelier",
"kind": "cve_record",
"ref": "concelier:osv:...",
"pruned": false
},
{
"domain": "excititor",
"kind": "effective_vex",
"ref": "excititor:openvex:...",
"pruned": false
}
],
"updatedAt": "2025-12-16T01:02:03Z"
}
```
Notes:
* `sourceRefs` provides preserved provenance chains (including pruned markers when applicable).
## 3. Evidence
### 3.1 List evidence for case
`GET /cases/{caseId}/evidence`
Response 200:
```json
{
"caseId": "uuid",
"items": [
{
"id": "uuid",
"type": "VEX_DOC",
"title": "Vendor OpenVEX assertion",
"issuer": "vendor.example",
"signed": true,
"signedBy": "CN=Vendor VEX Signer",
"contentHash": "hex",
"createdAt": "2025-12-15T22:10:00Z",
"previewUrl": "/api/triage/v1/evidence/uuid/preview",
"rawUrl": "/api/triage/v1/evidence/uuid/raw"
}
]
}
```
### 3.2 Get raw evidence object
`GET /evidence/{evidenceId}/raw`
Returns:
* `application/json` for JSON evidence
* `application/octet-stream` for binary
* MUST include `Content-SHA256` header (hex) when possible.
### 3.3 Preview evidence object
`GET /evidence/{evidenceId}/preview`
Returns a compact representation safe for UI preview.
## 4. Decisions
### 4.1 Create decision
`POST /decisions`
Request body:
```json
{
"caseId": "uuid",
"kind": "MUTE_REACH",
"reasonCode": "NON_REACHABLE",
"note": "No entry path in this env; reviewed runtime traces.",
"ttl": "2026-01-16T00:00:00Z"
}
```
Response 201:
```json
{
"decision": {
"id": "uuid",
"kind": "MUTE_REACH",
"reasonCode": "NON_REACHABLE",
"note": "No entry path in this env; reviewed runtime traces.",
"ttl": "2026-01-16T00:00:00Z",
"actor": { "subject": "user:abc", "display": "Vlad" },
"createdAt": "2025-12-16T01:10:00Z",
"signatureRef": "dsse:rekor:uuid"
}
}
```
Rules:
* Server signs decisions (DSSE) and persists signature reference.
* Creating a decision MUST create a `Snapshot` with trigger `DECISION`.
### 4.2 Revoke decision
`POST /decisions/{decisionId}/revoke`
Body (optional):
```json
{ "reason": "Mistake; reachability now observed." }
```
Response 200:
```json
{ "revokedAt": "2025-12-16T02:00:00Z", "signatureRef": "dsse:rekor:uuid" }
```
## 5. Snapshots & Smart-Diff
### 5.1 List snapshots
`GET /cases/{caseId}/snapshots`
Response 200:
```json
{
"caseId": "uuid",
"items": [
{
"id": "uuid",
"trigger": "POLICY_UPDATE",
"changedAt": "2025-12-16T00:00:00Z",
"fromInputsHash": "hex",
"toInputsHash": "hex",
"summary": "Policy version changed; gate threshold crossed."
}
]
}
```
### 5.2 Smart-Diff between two snapshots
`GET /cases/{caseId}/smart-diff?from={inputsHashA}&to={inputsHashB}`
Response 200:
```json
{
"fromInputsHash": "hex",
"toInputsHash": "hex",
"inputsChanged": [
{ "key": "policyVersion", "before": "2025.12.14", "after": "2025.12.16", "evidenceIds": ["uuid"] }
],
"outputsChanged": [
{ "key": "verdict", "before": "SHIP", "after": "BLOCK", "evidenceIds": ["uuid"] }
]
}
```
## 6. Export Evidence Bundle
### 6.1 Start export
`POST /cases/{caseId}/export`
Response 202:
```json
{
"exportId": "uuid",
"status": "QUEUED"
}
```
### 6.2 Poll export
`GET /exports/{exportId}`
Response 200:
```json
{
"exportId": "uuid",
"status": "READY",
"downloadUrl": "/api/triage/v1/exports/uuid/download"
}
```
### 6.3 Download bundle
`GET /exports/{exportId}/download`
Returns:
* `application/zip`
* DSSE envelope embedded (or alongside in zip)
* bundle contains replay manifest, artifacts, risk result, snapshots
## 7. Events (Notify.WebService integration)
These are emitted by `notify.webservice` when scanner outputs change.
* `first_signal`
* fired on first actionable detection for an asset/environment
* `risk_changed`
* fired when verdict/lane changes or thresholds crossed
* `gate_blocked`
* fired when CI gate blocks
Event payload includes:
* caseId
* old/new verdict/lane/score (for changed events)
* inputsHash
* links to `/cases/{caseId}`
---
**Document Version**: 1.0
**Target Platform**: .NET 10, PostgreSQL >= 16

249
docs/db/triage_schema.sql Normal file
View File

@@ -0,0 +1,249 @@
-- Stella Ops Triage Schema (PostgreSQL)
-- System of record: PostgreSQL
-- Ephemeral acceleration: Valkey (not represented here)
BEGIN;
-- Extensions
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Enums
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_lane') THEN
CREATE TYPE triage_lane AS ENUM (
'ACTIVE',
'BLOCKED',
'NEEDS_EXCEPTION',
'MUTED_REACH',
'MUTED_VEX',
'COMPENSATED'
);
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_verdict') THEN
CREATE TYPE triage_verdict AS ENUM ('SHIP', 'BLOCK', 'EXCEPTION');
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_reachability') THEN
CREATE TYPE triage_reachability AS ENUM ('YES', 'NO', 'UNKNOWN');
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_vex_status') THEN
CREATE TYPE triage_vex_status AS ENUM ('affected', 'not_affected', 'under_investigation', 'unknown');
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_decision_kind') THEN
CREATE TYPE triage_decision_kind AS ENUM ('MUTE_REACH', 'MUTE_VEX', 'ACK', 'EXCEPTION');
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_snapshot_trigger') THEN
CREATE TYPE triage_snapshot_trigger AS ENUM (
'FEED_UPDATE',
'VEX_UPDATE',
'SBOM_UPDATE',
'RUNTIME_TRACE',
'POLICY_UPDATE',
'DECISION',
'RESCAN'
);
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_type WHERE typname = 'triage_evidence_type') THEN
CREATE TYPE triage_evidence_type AS ENUM (
'SBOM_SLICE',
'VEX_DOC',
'PROVENANCE',
'CALLSTACK_SLICE',
'REACHABILITY_PROOF',
'REPLAY_MANIFEST',
'POLICY',
'SCAN_LOG',
'OTHER'
);
END IF;
END $$;
-- Core: finding (caseId == findingId)
CREATE TABLE IF NOT EXISTS triage_finding (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
asset_id uuid NOT NULL,
environment_id uuid NULL,
asset_label text NOT NULL, -- e.g. "prod/api-gateway:1.2.3"
purl text NOT NULL, -- package-url
cve_id text NULL,
rule_id text NULL,
first_seen_at timestamptz NOT NULL DEFAULT now(),
last_seen_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (asset_id, environment_id, purl, cve_id, rule_id)
);
CREATE INDEX IF NOT EXISTS ix_triage_finding_last_seen ON triage_finding (last_seen_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_finding_asset_label ON triage_finding (asset_label);
CREATE INDEX IF NOT EXISTS ix_triage_finding_purl ON triage_finding (purl);
CREATE INDEX IF NOT EXISTS ix_triage_finding_cve ON triage_finding (cve_id);
-- Effective VEX (post-merge), with preserved provenance pointers
CREATE TABLE IF NOT EXISTS triage_effective_vex (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
status triage_vex_status NOT NULL,
source_domain text NOT NULL, -- "excititor"
source_ref text NOT NULL, -- stable ref string (preserve prune source)
pruned_sources jsonb NULL, -- array of pruned items with reasons (optional)
dsse_envelope_hash text NULL,
signature_ref text NULL, -- rekor/ledger ref
issuer text NULL,
valid_from timestamptz NOT NULL DEFAULT now(),
valid_to timestamptz NULL,
collected_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS ix_triage_effective_vex_finding ON triage_effective_vex (finding_id, collected_at DESC);
-- Reachability results
CREATE TABLE IF NOT EXISTS triage_reachability_result (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
reachable triage_reachability NOT NULL,
confidence smallint NOT NULL CHECK (confidence >= 0 AND confidence <= 100),
static_proof_ref text NULL, -- evidence ref (callgraph slice / CFG slice)
runtime_proof_ref text NULL, -- evidence ref (runtime hits)
inputs_hash text NOT NULL, -- hash of inputs used to compute reachability
computed_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS ix_triage_reachability_finding ON triage_reachability_result (finding_id, computed_at DESC);
-- Risk/lattice result (scanner.webservice output)
CREATE TABLE IF NOT EXISTS triage_risk_result (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
policy_id text NOT NULL,
policy_version text NOT NULL,
inputs_hash text NOT NULL,
score int NOT NULL CHECK (score >= 0 AND score <= 100),
verdict triage_verdict NOT NULL,
lane triage_lane NOT NULL,
why text NOT NULL, -- short narrative
explanation jsonb NULL, -- structured lattice explanation for UI diffing
computed_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (finding_id, policy_id, policy_version, inputs_hash)
);
CREATE INDEX IF NOT EXISTS ix_triage_risk_finding ON triage_risk_result (finding_id, computed_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_risk_lane ON triage_risk_result (lane, computed_at DESC);
-- Signed Decisions (mute/ack/exception), reversible by revoke
CREATE TABLE IF NOT EXISTS triage_decision (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
kind triage_decision_kind NOT NULL,
reason_code text NOT NULL,
note text NULL,
policy_ref text NULL, -- optional: policy that allowed decision
ttl timestamptz NULL,
actor_subject text NOT NULL, -- Authority subject (sub)
actor_display text NULL,
signature_ref text NULL, -- DSSE signature reference
dsse_hash text NULL,
created_at timestamptz NOT NULL DEFAULT now(),
revoked_at timestamptz NULL,
revoke_reason text NULL,
revoke_signature_ref text NULL,
revoke_dsse_hash text NULL
);
CREATE INDEX IF NOT EXISTS ix_triage_decision_finding ON triage_decision (finding_id, created_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_decision_kind ON triage_decision (kind, created_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_decision_active ON triage_decision (finding_id) WHERE revoked_at IS NULL;
-- Evidence artifacts (hash-addressed, signed)
CREATE TABLE IF NOT EXISTS triage_evidence_artifact (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
type triage_evidence_type NOT NULL,
title text NOT NULL,
issuer text NULL,
signed boolean NOT NULL DEFAULT false,
signed_by text NULL,
content_hash text NOT NULL,
signature_ref text NULL,
media_type text NULL,
uri text NOT NULL, -- object store / file path / inline ref
size_bytes bigint NULL,
metadata jsonb NULL,
created_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (finding_id, type, content_hash)
);
CREATE INDEX IF NOT EXISTS ix_triage_evidence_finding ON triage_evidence_artifact (finding_id, created_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_evidence_type ON triage_evidence_artifact (type, created_at DESC);
-- Snapshots for Smart-Diff (immutable records of input/output changes)
CREATE TABLE IF NOT EXISTS triage_snapshot (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
finding_id uuid NOT NULL REFERENCES triage_finding(id) ON DELETE CASCADE,
trigger triage_snapshot_trigger NOT NULL,
from_inputs_hash text NULL,
to_inputs_hash text NOT NULL,
summary text NOT NULL,
diff_json jsonb NULL, -- optional: precomputed diff
created_at timestamptz NOT NULL DEFAULT now(),
UNIQUE (finding_id, to_inputs_hash, created_at)
);
CREATE INDEX IF NOT EXISTS ix_triage_snapshot_finding ON triage_snapshot (finding_id, created_at DESC);
CREATE INDEX IF NOT EXISTS ix_triage_snapshot_trigger ON triage_snapshot (trigger, created_at DESC);
-- Current-case view: latest risk + latest reachability + latest effective VEX
CREATE OR REPLACE VIEW v_triage_case_current AS
WITH latest_risk AS (
SELECT DISTINCT ON (finding_id)
finding_id, policy_id, policy_version, inputs_hash, score, verdict, lane, why, computed_at
FROM triage_risk_result
ORDER BY finding_id, computed_at DESC
),
latest_reach AS (
SELECT DISTINCT ON (finding_id)
finding_id, reachable, confidence, static_proof_ref, runtime_proof_ref, computed_at
FROM triage_reachability_result
ORDER BY finding_id, computed_at DESC
),
latest_vex AS (
SELECT DISTINCT ON (finding_id)
finding_id, status, issuer, signature_ref, source_domain, source_ref, collected_at
FROM triage_effective_vex
ORDER BY finding_id, collected_at DESC
)
SELECT
f.id AS case_id,
f.asset_id,
f.environment_id,
f.asset_label,
f.purl,
f.cve_id,
f.rule_id,
f.first_seen_at,
f.last_seen_at,
r.policy_id,
r.policy_version,
r.inputs_hash,
r.score,
r.verdict,
r.lane,
r.why,
r.computed_at AS risk_computed_at,
coalesce(re.reachable, 'UNKNOWN'::triage_reachability) AS reachable,
re.confidence AS reach_confidence,
v.status AS vex_status,
v.issuer AS vex_issuer,
v.signature_ref AS vex_signature_ref,
v.source_domain AS vex_source_domain,
v.source_ref AS vex_source_ref
FROM triage_finding f
LEFT JOIN latest_risk r ON r.finding_id = f.id
LEFT JOIN latest_reach re ON re.finding_id = f.id
LEFT JOIN latest_vex v ON v.finding_id = f.id;
COMMIT;

View File

@@ -0,0 +1,663 @@
# Performance Testing Pipeline for Queue-Based Workflows
> **Note**: This document was originally created as part of advisory analysis. It provides a comprehensive playbook for HTTP → Valkey → Worker performance testing.
---
## What we're measuring (plain English)
* **TTFB/TTFS (HTTP):** time the gateway spends accepting the request + queuing the job.
* **Valkey latency:** enqueue (`LPUSH`/`XADD`), pop/claim (`BRPOP`/`XREADGROUP`), and round-trip.
* **Worker service time:** time to pick up, process, and ack.
* **Queueing delay:** time spent waiting in the queue (arrival → start of worker).
These four add up to the "hop latency" users feel when the system is under load.
---
## Minimal tracing you can add today
Emit these IDs/headers end-to-end:
* `x-stella-corr-id` (uuid)
* `x-stella-enq-ts` (gateway enqueue ts, ns)
* `x-stella-claim-ts` (worker claim ts, ns)
* `x-stella-done-ts` (worker done ts, ns)
From these, compute:
* `queue_delay = claim_ts - enq_ts`
* `service_time = done_ts - claim_ts`
* `http_ttfs = gateway_first_byte_ts - http_request_start_ts`
* `hop_latency = done_ts - enq_ts` (or return-path if synchronous)
Clock-sync tip: use monotonic clocks in code and convert to ns; don't mix wall-clock.
---
## Valkey commands (safe, BSD Valkey)
Use **Valkey Streams + Consumer Groups** for fairness and metrics:
* Enqueue: `XADD jobs * corr-id <uuid> enq-ts <ns> payload <...>`
* Claim: `XREADGROUP GROUP workers w1 COUNT 1 BLOCK 1000 STREAMS jobs >`
* Ack: `XACK jobs workers <id>`
Add a small Lua for timestamping at enqueue (atomic):
```lua
-- KEYS[1]=stream
-- ARGV[1]=enq_ts_ns, ARGV[2]=corr_id, ARGV[3]=payload
return redis.call('XADD', KEYS[1], '*',
'corr', ARGV[2], 'enq', ARGV[1], 'p', ARGV[3])
```
---
## Load shapes to test (find the envelope)
1. **Open-loop (arrival-rate controlled):** 50 → 10k req/min in steps; constant rate per step. Reveals queueing onset.
2. **Burst:** 0 → N in short spikes (e.g., 5k in 10s) to see saturation and drain time.
3. **Step-up/down:** double every 2 min until SLO breach; then halve down.
4. **Long tail soak:** run at 7080% of max for 1h; watch p95-p99.9 drift.
Target outputs per step: **p50/p90/p95/p99** for `queue_delay`, `service_time`, `hop_latency`, plus **throughput** and **error rate**.
---
## k6 script (HTTP client pressure)
```javascript
// save as hop-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
scenarios: {
step_load: {
executor: 'ramping-arrival-rate',
startRate: 20, timeUnit: '1s',
preAllocatedVUs: 200, maxVUs: 5000,
stages: [
{ target: 50, duration: '1m' },
{ target: 100, duration: '1m' },
{ target: 200, duration: '1m' },
{ target: 400, duration: '1m' },
{ target: 800, duration: '1m' },
],
},
},
thresholds: {
'http_req_failed': ['rate<0.01'],
'http_req_duration{phase:hop}': ['p(95)<500'],
},
};
export default function () {
const corr = crypto.randomUUID();
const res = http.post(
__ENV.GW_URL,
JSON.stringify({ data: 'ping', corr }),
{
headers: { 'Content-Type': 'application/json', 'x-stella-corr-id': corr },
tags: { phase: 'hop' },
}
);
check(res, { 'status 2xx/202': r => r.status === 200 || r.status === 202 });
sleep(0.01);
}
```
Run: `GW_URL=https://gateway.example/hop k6 run hop-test.js`
---
## Worker hooks (.NET 10 sketch)
```csharp
// At claim
var now = Stopwatch.GetTimestamp(); // monotonic
var claimNs = now.ToNanoseconds();
log.AddTag("x-stella-claim-ts", claimNs);
// After processing
var doneNs = Stopwatch.GetTimestamp().ToNanoseconds();
log.AddTag("x-stella-done-ts", doneNs);
// Include corr-id and stream entry id in logs/metrics
```
Helper:
```csharp
public static class MonoTime {
static readonly double _nsPerTick = 1_000_000_000d / Stopwatch.Frequency;
public static long ToNanoseconds(this long ticks) => (long)(ticks * _nsPerTick);
}
```
---
## Prometheus metrics to expose
* `valkey_enqueue_ns` (histogram)
* `valkey_claim_block_ms` (gauge)
* `worker_service_ns` (histogram, labels: worker_type, route)
* `queue_depth` (gauge via `XLEN` or `XINFO STREAM`)
* `enqueue_rate`, `dequeue_rate` (counters)
Example recording rules:
```yaml
- record: hop:queue_delay_p95
expr: histogram_quantile(0.95, sum(rate(valkey_enqueue_ns_bucket[1m])) by (le))
- record: hop:service_time_p95
expr: histogram_quantile(0.95, sum(rate(worker_service_ns_bucket[1m])) by (le))
- record: hop:latency_budget_p95
expr: hop:queue_delay_p95 + hop:service_time_p95
```
---
## Autoscaling signals (HPA/KEDA friendly)
* **Primary:** queue depth & its derivative (d/dt).
* **Secondary:** p95 `queue_delay` and worker CPU.
* **Safety:** max in-flight per worker; backpressure HTTP 429 when `queue_depth > D` or `p95_queue_delay > SLO*0.8`.
---
## Plot the "envelope" (what you'll look at)
* X-axis: **offered load** (req/s).
* Y-axis: **p95 hop latency** (ms).
* Overlay: p99 (dashed), **SLO line** (e.g., 500 ms), and **capacity knee** (where p95 sharply rises).
* Add secondary panel: **queue depth** vs load.
---
# Performance Test Guidelines
## HTTP → Valkey → Worker pipeline
## 1) Objectives and scope
### Primary objectives
Your performance tests MUST answer these questions with evidence:
1. **Capacity knee**: At what offered load does **queue delay** start growing sharply?
2. **User-impact envelope**: What are p50/p95/p99 **hop latency** curves vs offered load?
3. **Decomposition**: How much of hop latency is:
* gateway enqueue time
* Valkey enqueue/claim RTT
* queue wait time
* worker service time
4. **Scaling behavior**: How do these change with worker replica counts (N workers)?
5. **Stability**: Under sustained load, do latencies drift (GC, memory, fragmentation, background jobs)?
### Non-goals (explicitly out of scope unless you add them later)
* Micro-optimizing single function runtime
* Synthetic "max QPS" records without a representative payload
* Tests that don't collect segment metrics (end-to-end only) for anything beyond basic smoke
---
## 2) Definitions and required metrics
### Required latency definitions (standardize these names)
Agents MUST compute and report these per request/job:
* **`t_http_accept`**: time from client send → gateway accepts request
* **`t_enqueue`**: time spent in gateway to enqueue into Valkey (server-side)
* **`t_valkey_rtt_enq`**: client-observed RTT for enqueue command(s)
* **`t_queue_delay`**: `claim_ts - enq_ts`
* **`t_service`**: `done_ts - claim_ts`
* **`t_hop`**: `done_ts - enq_ts` (this is the "true pipeline hop" latency)
* Optional but recommended:
* **`t_ack`**: time to ack completion (Valkey ack RTT)
* **`t_http_response`**: request start → gateway response sent (TTFB/TTFS)
### Required percentiles and aggregations
Per scenario step (e.g., each offered load plateau), agents MUST output:
* p50 / p90 / p95 / p99 / p99.9 for: `t_hop`, `t_queue_delay`, `t_service`, `t_enqueue`
* Throughput: offered rps and achieved rps
* Error rate: HTTP failures, enqueue failures, worker failures
* Queue depth and backlog drain time
### Required system-level telemetry (minimum)
Agents MUST collect these time series during tests:
* **Worker**: CPU, memory, GC pauses (if .NET), threadpool saturation indicators
* **Valkey**: ops/sec, connected clients, blocked clients, memory used, evictions, slowlog count
* **Gateway**: CPU/mem, request rate, response codes, request duration histogram
---
## 3) Environment and test hygiene requirements
### Environment requirements
Agents SHOULD run tests in an environment that matches production in:
* container CPU/memory limits
* number of nodes, network topology
* Valkey topology (single, cluster, sentinel, etc.)
* worker replica autoscaling rules (or deliberately disabled)
If exact parity isn't possible, agents MUST record all known differences in the report.
### Test hygiene (non-negotiable)
Agents MUST:
1. **Start from empty queues** (no backlog).
2. **Disable client retries** (or explicitly run two variants: retries off / retries on).
3. **Warm up** before measuring (e.g., 60s warm-up minimum).
4. **Hold steady plateaus** long enough to stabilize (usually 25 minutes per step).
5. **Cool down** and verify backlog drains (queue depth returns to baseline).
6. Record exact versions/SHAs of gateway/worker and Valkey config.
### Load generator hygiene
Agents MUST ensure the load generator is not the bottleneck:
* CPU < ~70% during test
* no local socket exhaustion
* enough VUs/connections
* if needed, distributed load generation
---
## 4) Instrumentation spec (agents implement this first)
### Correlation and timestamps
Agents MUST propagate an end-to-end correlation ID and timestamps.
**Required fields**
* `corr_id` (UUID)
* `enq_ts_ns` (set at enqueue, monotonic or consistent clock)
* `claim_ts_ns` (set by worker when job is claimed)
* `done_ts_ns` (set by worker when job processing ends)
**Where these live**
* HTTP request header: `x-corr-id: <uuid>`
* Valkey job payload fields: `corr`, `enq`, and optionally payload size/type
* Worker logs/metrics: include `corr_id`, job id, `claim_ts_ns`, `done_ts_ns`
### Clock requirements
Agents MUST use a consistent timing source:
* Prefer monotonic timers for durations (Stopwatch / monotonic clock)
* If timestamps cross machines, ensure they're comparable:
* either rely on synchronized clocks (NTP) **and** monitor drift
* or compute durations using monotonic tick deltas within the same host and transmit durations (less ideal for queue delay)
**Practical recommendation**: use wall-clock ns for cross-host timestamps with NTP + drift checks, and also record per-host monotonic durations for sanity.
### Valkey queue semantics (recommended)
Agents SHOULD use **Streams + Consumer Groups** for stable claim semantics and good observability:
* Enqueue: `XADD jobs * corr <uuid> enq <ns> payload <...>`
* Claim: `XREADGROUP GROUP workers <consumer> COUNT 1 BLOCK 1000 STREAMS jobs >`
* Ack: `XACK jobs workers <id>`
Agents MUST record stream length (`XLEN`) or consumer group lag (`XINFO GROUPS`) as queue depth/lag.
### Metrics exposure
Agents MUST publish Prometheus (or equivalent) histograms:
* `gateway_enqueue_seconds` (or ns) histogram
* `valkey_enqueue_rtt_seconds` histogram
* `worker_service_seconds` histogram
* `queue_delay_seconds` histogram (derived from timestamps; can be computed in worker or offline)
* `hop_latency_seconds` histogram
---
## 5) Workload modeling and test data
Agents MUST define a workload model before running capacity tests:
1. **Endpoint(s)**: list exact gateway routes under test
2. **Payload types**: small/typical/large
3. **Mix**: e.g., 70/25/5 by payload size
4. **Idempotency rules**: ensure repeated jobs don't corrupt state
5. **Data reset strategy**: how test data is cleaned or isolated per run
Agents SHOULD test at least:
* Typical payload (p50)
* Large payload (p95)
* Worst-case allowed payload (bounded by your API limits)
---
## 6) Scenario suite your agents MUST implement
Each scenario MUST be defined as code/config (not manual).
### Scenario A — Smoke (fast sanity)
**Goal**: verify instrumentation + basic correctness
**Load**: low (e.g., 15 rps), 2 minutes
**Pass**:
* 0 backlog after run
* error rate < 0.1%
* metrics present for all segments
### Scenario B — Baseline (repeatable reference point)
**Goal**: establish a stable baseline for regression tracking
**Load**: fixed moderate load (e.g., 3050% of expected capacity), 10 minutes
**Pass**:
* p95 `t_hop` within baseline ± tolerance (set after first runs)
* no upward drift in p95 across time (trend line ~flat)
### Scenario C — Capacity ramp (open-loop)
**Goal**: find the knee where queueing begins
**Method**: open-loop arrival-rate ramp with plateaus
Example stages (edit to fit your system):
* 50 rps for 2m
* 100 rps for 2m
* 200 rps for 2m
* 400 rps for 2m
* until SLO breach or errors spike
**MUST**:
* warm-up stage before first plateau
* record per-plateau summary
**Stop conditions** (any triggers stop):
* error rate > 1%
* queue depth grows without bound over an entire plateau
* p95 `t_hop` exceeds SLO for 2 consecutive plateaus
### Scenario D — Stress (push past capacity)
**Goal**: characterize failure mode and recovery
**Load**: 120200% of knee load, 510 minutes
**Pass** (for resilience):
* system does not crash permanently
* once load stops, backlog drains within target time (define it)
### Scenario E — Burst / spike
**Goal**: see how quickly queue grows and drains
**Load shape**:
* baseline low load
* sudden burst (e.g., 10× for 1030s)
* return to baseline
**Report**:
* peak queue depth
* time to drain to baseline
* p99 `t_hop` during burst
### Scenario F — Soak (long-running)
**Goal**: detect drift (leaks, fragmentation, GC patterns)
**Load**: 7085% of knee, 60180 minutes
**Pass**:
* p95 does not trend upward beyond threshold
* memory remains bounded
* no rising error rate
### Scenario G — Scaling curve (worker replica sweep)
**Goal**: turn results into scaling rules
**Method**:
* Repeat Scenario C with worker replicas = 1, 2, 4, 8…
**Deliverable**:
* plot of knee load vs worker count
* p95 `t_service` vs worker count (should remain similar; queue delay should drop)
---
## 7) Execution protocol (runbook)
Agents MUST run every scenario using the same disciplined flow:
### Pre-run checklist
* confirm system versions/SHAs
* confirm autoscaling mode:
* **Off** for baseline capacity characterization
* **On** for validating autoscaling policies
* clear queues and consumer group pending entries
* restart or at least record "time since deploy" for services (cold vs warm)
### During run
* ensure load is truly open-loop when required (arrival-rate based)
* continuously record:
* offered vs achieved rate
* queue depth
* CPU/mem for gateway/worker/Valkey
### Post-run
* stop load
* wait until backlog drains (or record that it doesn't)
* export:
* k6/runner raw output
* Prometheus time series snapshot
* sampled logs with corr_id fields
* generate a summary report automatically (no hand calculations)
---
## 8) Analysis rules (how agents compute "the envelope")
Agents MUST generate at minimum two plots per run:
1. **Latency envelope**: offered load (x-axis) vs p95 `t_hop` (y-axis)
* overlay p99 (and SLO line)
2. **Queue behavior**: offered load vs queue depth (or lag), plus drain time
### How to identify the "knee"
Agents SHOULD mark the knee as the first plateau where:
* queue depth grows monotonically within the plateau, **or**
* p95 `t_queue_delay` increases by > X% step-to-step (e.g., 50100%)
### Convert results into scaling guidance
Agents SHOULD compute:
* `capacity_per_worker ≈ 1 / mean(t_service)` (jobs/sec per worker)
* recommended replicas for offered load λ at target utilization U:
* `workers_needed = ceil(λ * mean(t_service) / U)`
* choose U ~ 0.60.75 for headroom
This should be reported alongside the measured envelope.
---
## 9) Pass/fail criteria and regression gates
Agents MUST define gates in configuration, not in someone's head.
Suggested gating structure:
* **Smoke gate**: error rate < 0.1%, backlog drains
* **Baseline gate**: p95 `t_hop` regression < 10% (tune after you have history)
* **Capacity gate**: knee load regression < 10% (optional but very valuable)
* **Soak gate**: p95 drift over time < 15% and no memory runaway
---
## 10) Common pitfalls (agents must avoid)
1. **Closed-loop tests used for capacity**
Closed-loop ("N concurrent users") self-throttles and can hide queueing onset. Use open-loop arrival rate for capacity.
2. **Ignoring queue depth**
A system can look "healthy" in request latency while silently building backlog.
3. **Measuring only gateway latency**
You must measure enqueue claim done to see the real hop.
4. **Load generator bottleneck**
If the generator saturates, you'll under-estimate capacity.
5. **Retries enabled by default**
Retries can inflate load and hide root causes; run with retries off first.
6. **Not controlling warm vs cold**
Cold caches vs warmed services produce different envelopes; record the condition.
---
# Agent implementation checklist (deliverables)
Assign these as concrete tasks to your agents.
## Agent 1 — Observability & tracing
MUST deliver:
* correlation id propagation gateway Valkey worker
* timestamps `enq/claim/done`
* Prometheus histograms for enqueue, service, hop
* queue depth metric (`XLEN` / `XINFO` lag)
## Agent 2 — Load test harness
MUST deliver:
* test runner scripts (k6 or equivalent) for scenarios AG
* test config file (YAML/JSON) controlling:
* stages (rates/durations)
* payload mix
* headers (corr-id)
* reproducible seeds and version stamping
## Agent 3 — Result collector and analyzer
MUST deliver:
* a pipeline that merges:
* load generator output
* hop timing data (from logs or a completion stream)
* Prometheus snapshots
* automatic summary + plots:
* latency envelope
* queue depth/drain
* CSV/JSON exports for long-term tracking
## Agent 4 — Reporting and dashboards
MUST deliver:
* a standard report template that includes:
* environment details
* scenario details
* key charts
* knee estimate
* scaling recommendation
* Grafana dashboard with the required panels
## Agent 5 — CI / release integration
SHOULD deliver:
* PR-level smoke test (Scenario A)
* nightly baseline (Scenario B)
* weekly capacity sweep (Scenario C + scaling curve)
---
## Template: scenario spec (agents can copy/paste)
```yaml
test_run:
system_under_test:
gateway_sha: "<git sha>"
worker_sha: "<git sha>"
valkey_version: "<version>"
environment:
cluster: "<name>"
workers: 4
autoscaling: "off" # off|on
workload:
endpoint: "/hop"
payload_profile: "p50"
mix:
p50: 0.7
p95: 0.25
max: 0.05
scenario:
name: "capacity_ramp"
mode: "open_loop"
warmup_seconds: 60
stages:
- rps: 50
duration_seconds: 120
- rps: 100
duration_seconds: 120
- rps: 200
duration_seconds: 120
- rps: 400
duration_seconds: 120
gates:
max_error_rate: 0.01
slo_ms_p95_hop: 500
backlog_must_drain_seconds: 300
outputs:
artifacts_dir: "./artifacts/<timestamp>/"
```
---
## Sample folder layout
```
perf/
docker-compose.yml
prometheus/
prometheus.yml
k6/
lib.js
smoke.js
capacity_ramp.js
burst.js
soak.js
stress.js
scaling_curve.sh
tools/
analyze.py
src/
Perf.Gateway/
Perf.Worker/
```
---
**Document Version**: 1.0
**Archived From**: docs/product-advisories/unprocessed/16-Dec-2025 - Reimagining Proof-Linked UX in Security Workflows.md
**Archive Reason**: Wrong content was pasted; this performance testing content preserved for future use.

View File

@@ -0,0 +1,365 @@
# SPRINT_3600_0001_0001 - Reachability Drift Detection Master Plan
**Status:** TODO
**Priority:** P0 - CRITICAL
**Module:** Scanner, Signals, Web
**Working Directory:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/`
**Estimated Effort:** X-Large (3 sub-sprints)
**Dependencies:** SPRINT_3500 (Smart-Diff) - COMPLETE
---
## Topic & Scope
Implementation of Reachability Drift Detection as specified in `docs/product-advisories/17-Dec-2025 - Reachability Drift Detection.md`. This extends Smart-Diff to detect when vulnerable code paths become reachable/unreachable between container image versions, with causal attribution and UI visualization.
**Business Value:**
- Transform from "all vulnerabilities" to "material reachability changes"
- Reduce alert fatigue by 90%+ through meaningful drift detection
- Enable causal attribution ("guard removed in AuthFilter.cs:42")
- Provide actionable UI for security review
---
## Dependencies & Concurrency
**Internal Dependencies:**
- `SPRINT_3500` (Smart-Diff) - COMPLETE - Provides MaterialRiskChangeDetector, VexCandidateEmitter
- `StellaOps.Signals.Contracts` - Provides CallPath, ReachabilitySignal models
- `StellaOps.Scanner.SmartDiff` - Provides detection infrastructure
- `vex.graph_nodes/edges` - Existing graph storage schema
**Concurrency:**
- Sprint 3600.2 (Call Graph) must complete before 3600.3 (Drift Detection)
- Sprint 3600.4 (UI) can start in parallel once 3600.3 API contracts are defined
---
## Documentation Prerequisites
Before starting implementation, read:
- `docs/product-advisories/17-Dec-2025 - Reachability Drift Detection.md`
- `docs/product-advisories/14-Dec-2025 - Smart-Diff Technical Reference.md`
- `docs/product-advisories/14-Dec-2025 - Reachability Analysis Technical Reference.md`
- `docs/modules/scanner/architecture.md`
- `docs/reachability/lattice.md`
- `bench/reachability-benchmark/README.md`
---
## Wave Coordination
```
SPRINT_3600_0002 (Call Graph Infrastructure)
SPRINT_3600_0003 (Drift Detection Engine)
├──────────────────────┐
▼ ▼
SPRINT_3600_0004 (UI) API Integration
│ │
└──────────────┬───────┘
Integration Tests
```
---
## Wave Detail Snapshots
### Wave 1: Call Graph Infrastructure (SPRINT_3600_0002_0001)
- .NET call graph extraction via Roslyn
- Node.js call graph extraction via AST parsing
- Entrypoint discovery for ASP.NET Core, Express, Fastify
- Sink taxonomy implementation
- Call graph storage and caching
### Wave 2: Drift Detection Engine (SPRINT_3600_0003_0001)
- Code change facts extraction (AST-level)
- Cross-scan graph comparison
- Drift cause attribution
- Path compression for storage
- API endpoints
### Wave 3: UI and Evidence Chain (SPRINT_3600_0004_0001)
- Angular Path Viewer component
- Risk Drift Card component
- Evidence chain linking
- DSSE attestation for drift results
- CLI output enhancements
---
## Interlocks
1. **Schema Versioning**: New tables must be versioned migrations (006_reachability_drift_tables.sql)
2. **Determinism**: Call graph extraction must be deterministic (stable node IDs)
3. **Benchmark Alignment**: Must pass `bench/reachability-benchmark` cases
4. **Smart-Diff Compat**: Must integrate with existing MaterialRiskChangeDetector
---
## Upcoming Checkpoints
- TBD
---
## Action Tracker
| Date (UTC) | Action | Owner | Notes |
|---|---|---|---|
| 2025-12-17 | Created master sprint from advisory analysis | Agent | Initial planning |
---
## 1. EXECUTIVE SUMMARY
Reachability Drift Detection extends Smart-Diff to track **function-level reachability changes** between scans. Instead of reporting all vulnerabilities, it identifies:
1. **New reachable paths** - Vulnerable sinks that became reachable
2. **Mitigated paths** - Vulnerable sinks that became unreachable
3. **Causal attribution** - Why the change occurred (guard removed, new route, etc.)
### Technical Approach
| Phase | Component | Description |
|-------|-----------|-------------|
| Extract | Call Graph Extractor | Per-language AST analysis |
| Model | Entrypoint Discovery | HTTP handlers, CLI commands, jobs |
| Diff | Code Change Facts | AST-level symbol changes |
| Analyze | Reachability BFS | Multi-source traversal from entrypoints |
| Compare | Drift Detector | Graph N vs N-1 comparison |
| Attribute | Cause Explainer | Link drift to code changes |
| Present | Path Viewer | Angular UI component |
---
## 2. ARCHITECTURE OVERVIEW
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ REACHABILITY DRIFT ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scan T-1 │ │ Scan T │ │ Call Graph │ │
│ │ (Baseline) │────►│ (Current) │────►│ Extractor │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ GRAPH EXTRACTION │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ .NET/Roslyn│ │ Node/AST │ │ Go/SSA │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ REACHABILITY ANALYSIS │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Entrypoint│ │BFS/DFS │ │ Sink │ │Reachable│ │ │
│ │ │Discovery │ │Traversal│ │Detection│ │ Set │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ DRIFT DETECTION │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │Code Change │ │Graph Diff │ │ Cause │ │ │
│ │ │ Facts │ │ Comparison │ │ Attribution│ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OUTPUT GENERATION │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Path Viewer│ │ SARIF │ │ DSSE │ │ │
│ │ │ UI │ │ Output │ │ Attestation│ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 3. SUB-SPRINT STRUCTURE
| Sprint | ID | Topic | Status | Priority | Dependencies |
|--------|-----|-------|--------|----------|--------------|
| 1 | SPRINT_3600_0002_0001 | Call Graph Infrastructure | TODO | P0 | Master |
| 2 | SPRINT_3600_0003_0001 | Drift Detection Engine | TODO | P0 | Sprint 1 |
| 3 | SPRINT_3600_0004_0001 | UI and Evidence Chain | TODO | P1 | Sprint 2 |
### Sprint Dependency Graph
```
SPRINT_3600_0002 (Call Graph)
├──────────────────────┐
▼ │
SPRINT_3600_0003 (Detection) │
│ │
├──────────────────────┤
▼ ▼
SPRINT_3600_0004 (UI) Integration
```
---
## 4. GAP ANALYSIS SUMMARY
### 4.1 Existing Infrastructure (Leverage Points)
| Component | Location | Status |
|-----------|----------|--------|
| MaterialRiskChangeDetector | `Scanner.SmartDiff.Detection` | COMPLETE |
| VexCandidateEmitter | `Scanner.SmartDiff.Detection` | COMPLETE |
| ReachabilityGateBridge | `Scanner.SmartDiff.Detection` | COMPLETE |
| CallPath model | `Signals.Contracts.Evidence` | COMPLETE |
| ReachabilityLatticeState | `Signals.Contracts.Evidence` | COMPLETE |
| vex.graph_nodes/edges | Database | COMPLETE |
| scanner.material_risk_changes | Database | COMPLETE |
| FN-Drift tracking | `Scanner.Core.Drift` | COMPLETE |
| Reachability benchmark | `bench/reachability-benchmark` | COMPLETE |
| Language analyzers | `Scanner.Analyzers.Lang.*` | PARTIAL |
### 4.2 Missing Components (Implementation Required)
| Component | Sprint | Priority |
|-----------|--------|----------|
| CallGraphExtractor.DotNet (Roslyn) | 3600.2 | P0 |
| CallGraphExtractor.Node (AST) | 3600.2 | P0 |
| EntrypointDiscovery.AspNetCore | 3600.2 | P0 |
| EntrypointDiscovery.Express | 3600.2 | P0 |
| SinkDetector (taxonomy) | 3600.2 | P0 |
| scanner.code_changes table | 3600.3 | P0 |
| scanner.call_graph_snapshots table | 3600.2 | P0 |
| CodeChangeFact extractor | 3600.3 | P0 |
| DriftCauseExplainer | 3600.3 | P0 |
| PathCompressor | 3600.3 | P1 |
| PathViewerComponent (Angular) | 3600.4 | P1 |
| RiskDriftCardComponent (Angular) | 3600.4 | P1 |
| DSSE attestation for drift | 3600.4 | P1 |
---
## 5. MODULE OWNERSHIP
| Module | Owner Role | Sprints |
|--------|------------|---------|
| Scanner | Scanner Guild | 3600.2, 3600.3 |
| Signals | Signals Guild | 3600.2 (contracts) |
| Web | Frontend Guild | 3600.4 |
| Attestor | Attestor Guild | 3600.4 (DSSE) |
---
## Delivery Tracker
| # | Task ID | Sprint | Status | Description |
|---|---------|--------|--------|-------------|
| 1 | RDRIFT-MASTER-0001 | 3600 | DOING | Coordinate all sub-sprints |
| 2 | RDRIFT-MASTER-0002 | 3600 | TODO | Create integration test suite |
| 3 | RDRIFT-MASTER-0003 | 3600 | TODO | Update Scanner AGENTS.md |
| 4 | RDRIFT-MASTER-0004 | 3600 | TODO | Update Web AGENTS.md |
| 5 | RDRIFT-MASTER-0005 | 3600 | TODO | Validate benchmark cases pass |
| 6 | RDRIFT-MASTER-0006 | 3600 | TODO | Document air-gap workflows |
---
## 6. SUCCESS CRITERIA
### 6.1 Functional Requirements
- [ ] .NET call graph extraction via Roslyn
- [ ] Node.js call graph extraction via AST
- [ ] ASP.NET Core entrypoint discovery
- [ ] Express/Fastify entrypoint discovery
- [ ] Sink taxonomy (9 categories)
- [ ] Code change facts extraction
- [ ] Cross-scan drift detection
- [ ] Causal attribution
- [ ] Path Viewer UI
- [ ] DSSE attestation
### 6.2 Determinism Requirements
- [ ] Same inputs produce identical call graph hash
- [ ] Node IDs stable across extractions
- [ ] Drift detection order-independent
- [ ] Path compression reversible
### 6.3 Test Requirements
- [ ] Unit tests for each extractor
- [ ] Integration tests with benchmark cases
- [ ] Golden fixtures for drift detection
- [ ] UI component tests
### 6.4 Performance Requirements
- [ ] Call graph extraction < 60s for 100K LOC
- [ ] Drift comparison < 5s per image pair
- [ ] Path storage < 10KB per compressed path
---
## Decisions & Risks
### 7.1 Architectural Decisions
| ID | Decision | Rationale |
|----|----------|-----------|
| RDRIFT-DEC-001 | Use scan_id not commit_sha | StellaOps is image-centric |
| RDRIFT-DEC-002 | Store graphs in CAS, metadata in Postgres | Separate large blobs from metadata |
| RDRIFT-DEC-003 | Start with .NET + Node only | Highest ROI languages |
| RDRIFT-DEC-004 | Extend existing schema, don't duplicate | Leverage vex.graph_* tables |
### 7.2 Risks & Mitigations
| ID | Risk | Likelihood | Impact | Mitigation |
|----|------|------------|--------|------------|
| RDRIFT-RISK-001 | Roslyn memory pressure on large solutions | Medium | High | Incremental analysis, streaming |
| RDRIFT-RISK-002 | Call graph over-approximation | Medium | Medium | Conservative static analysis |
| RDRIFT-RISK-003 | UI performance with large paths | Low | Medium | Path compression, lazy loading |
| RDRIFT-RISK-004 | False positive drift detection | Medium | Medium | Confidence scoring, review workflow |
---
## 8. DEPENDENCIES
### 8.1 Internal Dependencies
- `StellaOps.Scanner.SmartDiff` - Detection infrastructure
- `StellaOps.Signals.Contracts` - CallPath models
- `StellaOps.Attestor.ProofChain` - DSSE attestations
- `StellaOps.Scanner.Analyzers.Lang.*` - Language parsers
### 8.2 External Dependencies
- Microsoft.CodeAnalysis (Roslyn) - .NET analysis
- @babel/parser, @babel/traverse - Node.js analysis
- golang.org/x/tools/go/ssa - Go analysis (future)
---
## Execution Log
| Date (UTC) | Update | Owner |
|---|---|---|
| 2025-12-17 | Created master sprint from advisory analysis | Agent |
---
## 9. REFERENCES
- **Source Advisory**: `docs/product-advisories/17-Dec-2025 - Reachability Drift Detection.md`
- **Smart-Diff Reference**: `docs/product-advisories/14-Dec-2025 - Smart-Diff Technical Reference.md`
- **Reachability Reference**: `docs/product-advisories/14-Dec-2025 - Reachability Analysis Technical Reference.md`
- **Benchmark**: `bench/reachability-benchmark/README.md`

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,949 @@
# SPRINT_3600_0003_0001 - Drift Detection Engine
**Status:** TODO
**Priority:** P0 - CRITICAL
**Module:** Scanner
**Working Directory:** `src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/`
**Estimated Effort:** Medium
**Dependencies:** SPRINT_3600_0002_0001 (Call Graph Infrastructure)
---
## Topic & Scope
Implement the drift detection engine that compares call graphs between scans to identify reachability changes. This sprint covers:
- Code change facts extraction (AST-level)
- Cross-scan graph comparison
- Drift cause attribution
- Path compression for storage
- API endpoints for drift results
---
## Documentation Prerequisites
- `docs/product-advisories/17-Dec-2025 - Reachability Drift Detection.md`
- `docs/implplan/SPRINT_3600_0002_0001_call_graph_infrastructure.md`
- `src/Scanner/__Libraries/StellaOps.Scanner.SmartDiff/AGENTS.md`
---
## Wave Coordination
Single wave with sequential tasks:
1. Code change models and extraction
2. Cross-scan comparison engine
3. Cause attribution
4. Path compression
5. API integration
---
## Interlocks
- Depends on CallGraphSnapshot model from Sprint 3600.2
- Must integrate with existing MaterialRiskChangeDetector
- Must extend scanner.material_risk_changes table
---
## Action Tracker
| Date (UTC) | Action | Owner | Notes |
|---|---|---|---|
| 2025-12-17 | Created sprint from master plan | Agent | Initial |
---
## 1. OBJECTIVE
Build the drift detection engine:
1. **Code Change Facts** - Extract AST-level changes between scans
2. **Graph Comparison** - Detect reachability flips
3. **Cause Attribution** - Explain why drift occurred
4. **Path Compression** - Efficient storage for UI display
---
## 2. TECHNICAL DESIGN
### 2.1 Code Change Facts Model
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Models/CodeChangeFact.cs
namespace StellaOps.Scanner.ReachabilityDrift;
using System.Text.Json;
using System.Text.Json.Serialization;
/// <summary>
/// Represents an AST-level code change fact.
/// </summary>
public sealed record CodeChangeFact
{
[JsonPropertyName("id")]
public required Guid Id { get; init; }
[JsonPropertyName("scanId")]
public required string ScanId { get; init; }
[JsonPropertyName("baseScanId")]
public required string BaseScanId { get; init; }
[JsonPropertyName("file")]
public required string File { get; init; }
[JsonPropertyName("symbol")]
public required string Symbol { get; init; }
[JsonPropertyName("kind")]
public required CodeChangeKind Kind { get; init; }
[JsonPropertyName("details")]
public JsonDocument? Details { get; init; }
[JsonPropertyName("detectedAt")]
public required DateTimeOffset DetectedAt { get; init; }
}
/// <summary>
/// Types of code changes relevant to reachability.
/// </summary>
[JsonConverter(typeof(JsonStringEnumConverter<CodeChangeKind>))]
public enum CodeChangeKind
{
/// <summary>Symbol added (new function/method).</summary>
[JsonStringEnumMemberName("added")]
Added,
/// <summary>Symbol removed.</summary>
[JsonStringEnumMemberName("removed")]
Removed,
/// <summary>Function signature changed (parameters, return type).</summary>
[JsonStringEnumMemberName("signature_changed")]
SignatureChanged,
/// <summary>Guard condition around call modified.</summary>
[JsonStringEnumMemberName("guard_changed")]
GuardChanged,
/// <summary>Callee package/version changed.</summary>
[JsonStringEnumMemberName("dependency_changed")]
DependencyChanged,
/// <summary>Visibility changed (public<->internal<->private).</summary>
[JsonStringEnumMemberName("visibility_changed")]
VisibilityChanged
}
```
### 2.2 Drift Result Model
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Models/ReachabilityDriftResult.cs
namespace StellaOps.Scanner.ReachabilityDrift;
using System.Collections.Immutable;
using System.Text.Json.Serialization;
/// <summary>
/// Result of reachability drift detection between two scans.
/// </summary>
public sealed record ReachabilityDriftResult
{
[JsonPropertyName("baseScanId")]
public required string BaseScanId { get; init; }
[JsonPropertyName("headScanId")]
public required string HeadScanId { get; init; }
[JsonPropertyName("detectedAt")]
public required DateTimeOffset DetectedAt { get; init; }
[JsonPropertyName("newlyReachable")]
public required ImmutableArray<DriftedSink> NewlyReachable { get; init; }
[JsonPropertyName("newlyUnreachable")]
public required ImmutableArray<DriftedSink> NewlyUnreachable { get; init; }
[JsonPropertyName("totalDriftCount")]
public int TotalDriftCount => NewlyReachable.Length + NewlyUnreachable.Length;
[JsonPropertyName("hasMaterialDrift")]
public bool HasMaterialDrift => TotalDriftCount > 0;
}
/// <summary>
/// A sink that changed reachability status.
/// </summary>
public sealed record DriftedSink
{
[JsonPropertyName("sinkNodeId")]
public required string SinkNodeId { get; init; }
[JsonPropertyName("symbol")]
public required string Symbol { get; init; }
[JsonPropertyName("sinkCategory")]
public required SinkCategory SinkCategory { get; init; }
[JsonPropertyName("direction")]
public required DriftDirection Direction { get; init; }
[JsonPropertyName("cause")]
public required DriftCause Cause { get; init; }
[JsonPropertyName("path")]
public required CompressedPath Path { get; init; }
[JsonPropertyName("associatedVulns")]
public ImmutableArray<AssociatedVuln> AssociatedVulns { get; init; } = [];
}
/// <summary>
/// Direction of reachability drift.
/// </summary>
[JsonConverter(typeof(JsonStringEnumConverter<DriftDirection>))]
public enum DriftDirection
{
[JsonStringEnumMemberName("became_reachable")]
BecameReachable,
[JsonStringEnumMemberName("became_unreachable")]
BecameUnreachable
}
/// <summary>
/// Cause of the drift, linked to code changes.
/// </summary>
public sealed record DriftCause
{
[JsonPropertyName("kind")]
public required DriftCauseKind Kind { get; init; }
[JsonPropertyName("description")]
public required string Description { get; init; }
[JsonPropertyName("changedSymbol")]
public string? ChangedSymbol { get; init; }
[JsonPropertyName("changedFile")]
public string? ChangedFile { get; init; }
[JsonPropertyName("changedLine")]
public int? ChangedLine { get; init; }
[JsonPropertyName("codeChangeId")]
public Guid? CodeChangeId { get; init; }
public static DriftCause GuardRemoved(string symbol, string file, int line) =>
new()
{
Kind = DriftCauseKind.GuardRemoved,
Description = $"Guard condition removed in {symbol}",
ChangedSymbol = symbol,
ChangedFile = file,
ChangedLine = line
};
public static DriftCause NewPublicRoute(string symbol) =>
new()
{
Kind = DriftCauseKind.NewPublicRoute,
Description = $"New public entrypoint: {symbol}",
ChangedSymbol = symbol
};
public static DriftCause VisibilityEscalated(string symbol) =>
new()
{
Kind = DriftCauseKind.VisibilityEscalated,
Description = $"Visibility escalated to public: {symbol}",
ChangedSymbol = symbol
};
public static DriftCause DependencyUpgraded(string package, string fromVersion, string toVersion) =>
new()
{
Kind = DriftCauseKind.DependencyUpgraded,
Description = $"Dependency upgraded: {package} {fromVersion} -> {toVersion}"
};
public static DriftCause GuardAdded(string symbol) =>
new()
{
Kind = DriftCauseKind.GuardAdded,
Description = $"Guard condition added in {symbol}",
ChangedSymbol = symbol
};
public static DriftCause SymbolRemoved(string symbol) =>
new()
{
Kind = DriftCauseKind.SymbolRemoved,
Description = $"Symbol removed: {symbol}",
ChangedSymbol = symbol
};
public static DriftCause Unknown() =>
new()
{
Kind = DriftCauseKind.Unknown,
Description = "Cause could not be determined"
};
}
[JsonConverter(typeof(JsonStringEnumConverter<DriftCauseKind>))]
public enum DriftCauseKind
{
[JsonStringEnumMemberName("guard_removed")]
GuardRemoved,
[JsonStringEnumMemberName("guard_added")]
GuardAdded,
[JsonStringEnumMemberName("new_public_route")]
NewPublicRoute,
[JsonStringEnumMemberName("visibility_escalated")]
VisibilityEscalated,
[JsonStringEnumMemberName("dependency_upgraded")]
DependencyUpgraded,
[JsonStringEnumMemberName("symbol_removed")]
SymbolRemoved,
[JsonStringEnumMemberName("unknown")]
Unknown
}
/// <summary>
/// Vulnerability associated with a sink.
/// </summary>
public sealed record AssociatedVuln
{
[JsonPropertyName("cveId")]
public required string CveId { get; init; }
[JsonPropertyName("epss")]
public double? Epss { get; init; }
[JsonPropertyName("cvss")]
public double? Cvss { get; init; }
[JsonPropertyName("vexStatus")]
public string? VexStatus { get; init; }
[JsonPropertyName("packagePurl")]
public string? PackagePurl { get; init; }
}
```
### 2.3 Compressed Path Model
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Models/CompressedPath.cs
namespace StellaOps.Scanner.ReachabilityDrift;
using System.Collections.Immutable;
using System.Text.Json.Serialization;
/// <summary>
/// Compressed representation of a call path for storage and UI.
/// </summary>
public sealed record CompressedPath
{
[JsonPropertyName("entrypoint")]
public required PathNode Entrypoint { get; init; }
[JsonPropertyName("sink")]
public required PathNode Sink { get; init; }
[JsonPropertyName("intermediateCount")]
public required int IntermediateCount { get; init; }
[JsonPropertyName("keyNodes")]
public required ImmutableArray<PathNode> KeyNodes { get; init; }
[JsonPropertyName("fullPath")]
public ImmutableArray<string>? FullPath { get; init; } // Node IDs for expansion
}
/// <summary>
/// Node in a compressed path.
/// </summary>
public sealed record PathNode
{
[JsonPropertyName("nodeId")]
public required string NodeId { get; init; }
[JsonPropertyName("symbol")]
public required string Symbol { get; init; }
[JsonPropertyName("file")]
public string? File { get; init; }
[JsonPropertyName("line")]
public int? Line { get; init; }
[JsonPropertyName("package")]
public string? Package { get; init; }
[JsonPropertyName("isChanged")]
public bool IsChanged { get; init; }
[JsonPropertyName("changeKind")]
public CodeChangeKind? ChangeKind { get; init; }
}
```
### 2.4 Drift Detector Service
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/ReachabilityDriftDetector.cs
namespace StellaOps.Scanner.ReachabilityDrift.Services;
using StellaOps.Scanner.CallGraph;
using StellaOps.Scanner.CallGraph.Analysis;
/// <summary>
/// Detects reachability drift between two scan snapshots.
/// </summary>
public sealed class ReachabilityDriftDetector
{
private readonly ReachabilityAnalyzer _reachabilityAnalyzer = new();
private readonly DriftCauseExplainer _causeExplainer = new();
private readonly PathCompressor _pathCompressor = new();
/// <summary>
/// Compares two call graph snapshots and returns drift results.
/// </summary>
public ReachabilityDriftResult Detect(
CallGraphSnapshot baseGraph,
CallGraphSnapshot headGraph,
IReadOnlyList<CodeChangeFact> codeChanges)
{
// Compute reachability for both graphs
var baseReachability = _reachabilityAnalyzer.Analyze(baseGraph);
var headReachability = _reachabilityAnalyzer.Analyze(headGraph);
var newlyReachable = new List<DriftedSink>();
var newlyUnreachable = new List<DriftedSink>();
// Find sinks that became reachable
foreach (var sinkId in headGraph.SinkIds)
{
var wasReachable = baseReachability.ReachableSinks.Contains(sinkId);
var isReachable = headReachability.ReachableSinks.Contains(sinkId);
if (!wasReachable && isReachable)
{
var sink = headGraph.Nodes.First(n => n.NodeId == sinkId);
var path = headReachability.ShortestPaths.TryGetValue(sinkId, out var p) ? p : [];
var cause = _causeExplainer.Explain(baseGraph, headGraph, sinkId, path, codeChanges);
newlyReachable.Add(new DriftedSink
{
SinkNodeId = sinkId,
Symbol = sink.Symbol,
SinkCategory = sink.SinkCategory ?? SinkCategory.CmdExec,
Direction = DriftDirection.BecameReachable,
Cause = cause,
Path = _pathCompressor.Compress(path, headGraph, codeChanges)
});
}
}
// Find sinks that became unreachable
foreach (var sinkId in baseGraph.SinkIds)
{
var wasReachable = baseReachability.ReachableSinks.Contains(sinkId);
var isReachable = headReachability.ReachableSinks.Contains(sinkId);
if (wasReachable && !isReachable)
{
var sink = baseGraph.Nodes.First(n => n.NodeId == sinkId);
var path = baseReachability.ShortestPaths.TryGetValue(sinkId, out var p) ? p : [];
var cause = _causeExplainer.ExplainUnreachable(baseGraph, headGraph, sinkId, path, codeChanges);
newlyUnreachable.Add(new DriftedSink
{
SinkNodeId = sinkId,
Symbol = sink.Symbol,
SinkCategory = sink.SinkCategory ?? SinkCategory.CmdExec,
Direction = DriftDirection.BecameUnreachable,
Cause = cause,
Path = _pathCompressor.Compress(path, baseGraph, codeChanges)
});
}
}
return new ReachabilityDriftResult
{
BaseScanId = baseGraph.ScanId,
HeadScanId = headGraph.ScanId,
DetectedAt = DateTimeOffset.UtcNow,
NewlyReachable = newlyReachable.ToImmutableArray(),
NewlyUnreachable = newlyUnreachable.ToImmutableArray()
};
}
}
```
### 2.5 Drift Cause Explainer
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/DriftCauseExplainer.cs
namespace StellaOps.Scanner.ReachabilityDrift.Services;
using StellaOps.Scanner.CallGraph;
/// <summary>
/// Explains why a reachability drift occurred.
/// </summary>
public sealed class DriftCauseExplainer
{
/// <summary>
/// Explains why a sink became reachable.
/// </summary>
public DriftCause Explain(
CallGraphSnapshot baseGraph,
CallGraphSnapshot headGraph,
string sinkNodeId,
ImmutableArray<string> path,
IReadOnlyList<CodeChangeFact> codeChanges)
{
if (path.IsDefaultOrEmpty)
return DriftCause.Unknown();
// Check each node on path for code changes
foreach (var nodeId in path)
{
var headNode = headGraph.Nodes.FirstOrDefault(n => n.NodeId == nodeId);
if (headNode is null) continue;
var change = codeChanges.FirstOrDefault(c =>
c.Symbol == headNode.Symbol ||
c.Symbol == ExtractTypeName(headNode.Symbol));
if (change is not null)
{
return change.Kind switch
{
CodeChangeKind.GuardChanged => DriftCause.GuardRemoved(
headNode.Symbol, headNode.File, headNode.Line),
CodeChangeKind.Added => DriftCause.NewPublicRoute(headNode.Symbol),
CodeChangeKind.VisibilityChanged => DriftCause.VisibilityEscalated(headNode.Symbol),
CodeChangeKind.DependencyChanged => ExplainDependencyChange(change),
_ => DriftCause.Unknown()
};
}
}
// Check if entrypoint is new
var entrypoint = path.FirstOrDefault();
if (entrypoint is not null)
{
var baseHasEntrypoint = baseGraph.EntrypointIds.Contains(entrypoint);
var headHasEntrypoint = headGraph.EntrypointIds.Contains(entrypoint);
if (!baseHasEntrypoint && headHasEntrypoint)
{
var epNode = headGraph.Nodes.First(n => n.NodeId == entrypoint);
return DriftCause.NewPublicRoute(epNode.Symbol);
}
}
return DriftCause.Unknown();
}
/// <summary>
/// Explains why a sink became unreachable.
/// </summary>
public DriftCause ExplainUnreachable(
CallGraphSnapshot baseGraph,
CallGraphSnapshot headGraph,
string sinkNodeId,
ImmutableArray<string> basePath,
IReadOnlyList<CodeChangeFact> codeChanges)
{
// Check if any node on path was removed
foreach (var nodeId in basePath)
{
var existsInHead = headGraph.Nodes.Any(n => n.NodeId == nodeId);
if (!existsInHead)
{
var baseNode = baseGraph.Nodes.First(n => n.NodeId == nodeId);
return DriftCause.SymbolRemoved(baseNode.Symbol);
}
}
// Check for guard additions
foreach (var nodeId in basePath)
{
var change = codeChanges.FirstOrDefault(c =>
c.Kind == CodeChangeKind.GuardChanged);
if (change is not null)
{
return DriftCause.GuardAdded(change.Symbol);
}
}
return DriftCause.Unknown();
}
private static string ExtractTypeName(string symbol)
{
var lastDot = symbol.LastIndexOf('.');
if (lastDot > 0)
{
var beforeMethod = symbol[..lastDot];
var typeEnd = beforeMethod.LastIndexOf('.');
return typeEnd > 0 ? beforeMethod[(typeEnd + 1)..] : beforeMethod;
}
return symbol;
}
private static DriftCause ExplainDependencyChange(CodeChangeFact change)
{
if (change.Details is not null)
{
var details = change.Details.RootElement;
var package = details.TryGetProperty("package", out var p) ? p.GetString() : "unknown";
var from = details.TryGetProperty("fromVersion", out var f) ? f.GetString() : "?";
var to = details.TryGetProperty("toVersion", out var t) ? t.GetString() : "?";
return DriftCause.DependencyUpgraded(package ?? "unknown", from ?? "?", to ?? "?");
}
return DriftCause.Unknown();
}
}
```
### 2.6 Path Compressor
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Services/PathCompressor.cs
namespace StellaOps.Scanner.ReachabilityDrift.Services;
using StellaOps.Scanner.CallGraph;
/// <summary>
/// Compresses call paths for efficient storage and UI display.
/// </summary>
public sealed class PathCompressor
{
private const int MaxKeyNodes = 5;
/// <summary>
/// Compresses a full path to key nodes only.
/// </summary>
public CompressedPath Compress(
ImmutableArray<string> fullPath,
CallGraphSnapshot graph,
IReadOnlyList<CodeChangeFact> codeChanges)
{
if (fullPath.IsDefaultOrEmpty)
{
return new CompressedPath
{
Entrypoint = new PathNode { NodeId = "unknown", Symbol = "unknown" },
Sink = new PathNode { NodeId = "unknown", Symbol = "unknown" },
IntermediateCount = 0,
KeyNodes = []
};
}
var entrypointNode = graph.Nodes.FirstOrDefault(n => n.NodeId == fullPath[0]);
var sinkNode = graph.Nodes.FirstOrDefault(n => n.NodeId == fullPath[^1]);
// Identify key nodes (changed, entry, sink, or interesting)
var keyNodes = new List<PathNode>();
var changedSymbols = codeChanges.Select(c => c.Symbol).ToHashSet();
for (var i = 1; i < fullPath.Length - 1 && keyNodes.Count < MaxKeyNodes; i++)
{
var nodeId = fullPath[i];
var node = graph.Nodes.FirstOrDefault(n => n.NodeId == nodeId);
if (node is null) continue;
var isChanged = changedSymbols.Contains(node.Symbol);
var change = codeChanges.FirstOrDefault(c => c.Symbol == node.Symbol);
if (isChanged || node.IsEntrypoint || node.IsSink)
{
keyNodes.Add(new PathNode
{
NodeId = node.NodeId,
Symbol = node.Symbol,
File = node.File,
Line = node.Line,
Package = node.Package,
IsChanged = isChanged,
ChangeKind = change?.Kind
});
}
}
return new CompressedPath
{
Entrypoint = CreatePathNode(entrypointNode, changedSymbols, codeChanges),
Sink = CreatePathNode(sinkNode, changedSymbols, codeChanges),
IntermediateCount = fullPath.Length - 2,
KeyNodes = keyNodes.ToImmutableArray(),
FullPath = fullPath // Optionally include for expansion
};
}
private static PathNode CreatePathNode(
CallGraphNode? node,
HashSet<string> changedSymbols,
IReadOnlyList<CodeChangeFact> codeChanges)
{
if (node is null)
{
return new PathNode { NodeId = "unknown", Symbol = "unknown" };
}
var isChanged = changedSymbols.Contains(node.Symbol);
var change = codeChanges.FirstOrDefault(c => c.Symbol == node.Symbol);
return new PathNode
{
NodeId = node.NodeId,
Symbol = node.Symbol,
File = node.File,
Line = node.Line,
Package = node.Package,
IsChanged = isChanged,
ChangeKind = change?.Kind
};
}
}
```
### 2.7 Database Schema Extensions
```sql
-- File: src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/007_drift_detection_tables.sql
-- Sprint: SPRINT_3600_0003_0001
-- Description: Drift detection engine tables
-- Code change facts from AST-level analysis
CREATE TABLE IF NOT EXISTS scanner.code_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id TEXT NOT NULL,
base_scan_id TEXT NOT NULL,
file TEXT NOT NULL,
symbol TEXT NOT NULL,
change_kind TEXT NOT NULL,
details JSONB,
detected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT code_changes_unique UNIQUE (tenant_id, scan_id, base_scan_id, file, symbol)
);
CREATE INDEX IF NOT EXISTS idx_code_changes_scan ON scanner.code_changes(scan_id);
CREATE INDEX IF NOT EXISTS idx_code_changes_symbol ON scanner.code_changes(symbol);
CREATE INDEX IF NOT EXISTS idx_code_changes_kind ON scanner.code_changes(change_kind);
-- Extend material_risk_changes with drift-specific columns
ALTER TABLE scanner.material_risk_changes
ADD COLUMN IF NOT EXISTS cause TEXT,
ADD COLUMN IF NOT EXISTS cause_kind TEXT,
ADD COLUMN IF NOT EXISTS path_nodes JSONB,
ADD COLUMN IF NOT EXISTS base_scan_id TEXT,
ADD COLUMN IF NOT EXISTS associated_vulns JSONB;
CREATE INDEX IF NOT EXISTS idx_material_risk_changes_cause
ON scanner.material_risk_changes(cause_kind)
WHERE cause_kind IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_material_risk_changes_base_scan
ON scanner.material_risk_changes(base_scan_id)
WHERE base_scan_id IS NOT NULL;
-- Reachability drift results (aggregate per scan pair)
CREATE TABLE IF NOT EXISTS scanner.reachability_drift_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
base_scan_id TEXT NOT NULL,
head_scan_id TEXT NOT NULL,
newly_reachable_count INT NOT NULL DEFAULT 0,
newly_unreachable_count INT NOT NULL DEFAULT 0,
detected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
result_digest TEXT NOT NULL, -- Hash for dedup
CONSTRAINT reachability_drift_unique UNIQUE (tenant_id, base_scan_id, head_scan_id)
);
CREATE INDEX IF NOT EXISTS idx_drift_results_head_scan
ON scanner.reachability_drift_results(head_scan_id);
-- Drifted sinks (individual sink drift records)
CREATE TABLE IF NOT EXISTS scanner.drifted_sinks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
drift_result_id UUID NOT NULL REFERENCES scanner.reachability_drift_results(id),
sink_node_id TEXT NOT NULL,
symbol TEXT NOT NULL,
sink_category TEXT NOT NULL,
direction TEXT NOT NULL, -- became_reachable|became_unreachable
cause_kind TEXT NOT NULL,
cause_description TEXT NOT NULL,
cause_symbol TEXT,
cause_file TEXT,
cause_line INT,
code_change_id UUID REFERENCES scanner.code_changes(id),
compressed_path JSONB NOT NULL,
associated_vulns JSONB,
CONSTRAINT drifted_sinks_unique UNIQUE (drift_result_id, sink_node_id)
);
CREATE INDEX IF NOT EXISTS idx_drifted_sinks_drift_result
ON scanner.drifted_sinks(drift_result_id);
CREATE INDEX IF NOT EXISTS idx_drifted_sinks_direction
ON scanner.drifted_sinks(direction);
CREATE INDEX IF NOT EXISTS idx_drifted_sinks_category
ON scanner.drifted_sinks(sink_category);
-- Enable RLS
ALTER TABLE scanner.code_changes ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.reachability_drift_results ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.drifted_sinks ENABLE ROW LEVEL SECURITY;
DROP POLICY IF EXISTS code_changes_tenant_isolation ON scanner.code_changes;
CREATE POLICY code_changes_tenant_isolation ON scanner.code_changes
USING (tenant_id = scanner.current_tenant_id());
DROP POLICY IF EXISTS drift_results_tenant_isolation ON scanner.reachability_drift_results;
CREATE POLICY drift_results_tenant_isolation ON scanner.reachability_drift_results
USING (tenant_id = scanner.current_tenant_id());
DROP POLICY IF EXISTS drifted_sinks_tenant_isolation ON scanner.drifted_sinks;
CREATE POLICY drifted_sinks_tenant_isolation ON scanner.drifted_sinks
USING (tenant_id = (
SELECT tenant_id FROM scanner.reachability_drift_results
WHERE id = drift_result_id
));
COMMENT ON TABLE scanner.code_changes IS 'AST-level code change facts for drift analysis';
COMMENT ON TABLE scanner.reachability_drift_results IS 'Aggregate drift results per scan pair';
COMMENT ON TABLE scanner.drifted_sinks IS 'Individual drifted sink records with causes and paths';
```
---
## Delivery Tracker
| # | Task ID | Status | Description | Notes |
|---|---------|--------|-------------|-------|
| 1 | DRIFT-001 | TODO | Create CodeChangeFact model | With all change kinds |
| 2 | DRIFT-002 | TODO | Create CodeChangeKind enum | 6 types |
| 3 | DRIFT-003 | TODO | Create ReachabilityDriftResult model | Aggregate result |
| 4 | DRIFT-004 | TODO | Create DriftedSink model | With cause and path |
| 5 | DRIFT-005 | TODO | Create DriftDirection enum | 2 directions |
| 6 | DRIFT-006 | TODO | Create DriftCause model | With factory methods |
| 7 | DRIFT-007 | TODO | Create DriftCauseKind enum | 7 kinds |
| 8 | DRIFT-008 | TODO | Create CompressedPath model | For UI display |
| 9 | DRIFT-009 | TODO | Create PathNode model | With change flags |
| 10 | DRIFT-010 | TODO | Implement ReachabilityDriftDetector | Core detection |
| 11 | DRIFT-011 | TODO | Implement DriftCauseExplainer | Cause attribution |
| 12 | DRIFT-012 | TODO | Implement ExplainUnreachable method | Reverse direction |
| 13 | DRIFT-013 | TODO | Implement PathCompressor | Key node selection |
| 14 | DRIFT-014 | TODO | Create Postgres migration 007 | code_changes, drift tables |
| 15 | DRIFT-015 | TODO | Implement ICodeChangeRepository | Storage contract |
| 16 | DRIFT-016 | TODO | Implement PostgresCodeChangeRepository | With Dapper |
| 17 | DRIFT-017 | TODO | Implement IDriftResultRepository | Storage contract |
| 18 | DRIFT-018 | TODO | Implement PostgresDriftResultRepository | With Dapper |
| 19 | DRIFT-019 | TODO | Unit tests for ReachabilityDriftDetector | Various scenarios |
| 20 | DRIFT-020 | TODO | Unit tests for DriftCauseExplainer | All cause kinds |
| 21 | DRIFT-021 | TODO | Unit tests for PathCompressor | Compression logic |
| 22 | DRIFT-022 | TODO | Integration tests with benchmark cases | End-to-end |
| 23 | DRIFT-023 | TODO | Golden fixtures for drift detection | Determinism |
| 24 | DRIFT-024 | TODO | API endpoint GET /scans/{id}/drift | Drift results |
| 25 | DRIFT-025 | TODO | API endpoint GET /drift/{id}/sinks | Individual sinks |
| 26 | DRIFT-026 | TODO | Integrate with MaterialRiskChangeDetector | Extend R1 rule |
---
## 3. ACCEPTANCE CRITERIA
### 3.1 Code Change Detection
- [ ] Detects added symbols
- [ ] Detects removed symbols
- [ ] Detects signature changes
- [ ] Detects guard changes
- [ ] Detects dependency changes
- [ ] Detects visibility changes
### 3.2 Drift Detection
- [ ] Correctly identifies newly reachable sinks
- [ ] Correctly identifies newly unreachable sinks
- [ ] Handles graphs with different node sets
- [ ] Handles cyclic graphs
### 3.3 Cause Attribution
- [ ] Attributes guard removal causes
- [ ] Attributes new route causes
- [ ] Attributes visibility escalation causes
- [ ] Attributes dependency upgrade causes
- [ ] Provides unknown cause for undetectable cases
### 3.4 Path Compression
- [ ] Selects appropriate key nodes
- [ ] Marks changed nodes correctly
- [ ] Preserves entrypoint and sink
- [ ] Limits key nodes to max count
### 3.5 Integration
- [ ] Integrates with MaterialRiskChangeDetector
- [ ] Extends material_risk_changes table correctly
- [ ] API endpoints return correct data
---
## Decisions & Risks
| ID | Decision | Rationale |
|----|----------|-----------|
| DRIFT-DEC-001 | Extend existing tables, don't duplicate | Leverage scanner.material_risk_changes |
| DRIFT-DEC-002 | Store full path optionally | Enable UI expansion without re-computation |
| DRIFT-DEC-003 | Limit key nodes to 5 | Balance detail vs. storage |
| ID | Risk | Mitigation |
|----|------|------------|
| DRIFT-RISK-001 | Cause attribution false positives | Conservative matching, show "unknown" |
| DRIFT-RISK-002 | Large path storage | Compression, CAS for full paths |
| DRIFT-RISK-003 | Performance on large graphs | Caching, pre-computed reachability |
---
## Execution Log
| Date (UTC) | Update | Owner |
|---|---|---|
| 2025-12-17 | Created sprint from master plan | Agent |
---
## References
- **Master Sprint**: `SPRINT_3600_0001_0001_reachability_drift_master.md`
- **Call Graph Sprint**: `SPRINT_3600_0002_0001_call_graph_infrastructure.md`
- **Advisory**: `17-Dec-2025 - Reachability Drift Detection.md`

View File

@@ -0,0 +1,886 @@
# SPRINT_3600_0004_0001 - UI and Evidence Chain
**Status:** TODO
**Priority:** P1 - HIGH
**Module:** Web, Attestor
**Working Directory:** `src/Web/StellaOps.Web/`, `src/Attestor/`
**Estimated Effort:** Medium
**Dependencies:** SPRINT_3600_0003_0001 (Drift Detection Engine)
---
## Topic & Scope
Implement the UI components and evidence chain integration for reachability drift. This sprint covers:
- Angular Path Viewer component
- Risk Drift Card component
- DSSE attestation for drift results
- CLI output enhancements
- SARIF integration
---
## Documentation Prerequisites
- `docs/product-advisories/17-Dec-2025 - Reachability Drift Detection.md`
- `docs/implplan/SPRINT_3600_0003_0001_drift_detection_engine.md`
- `docs/modules/attestor/architecture.md`
- `src/Web/StellaOps.Web/README.md`
---
## Wave Coordination
Parallel tracks:
- Track A: Angular UI components
- Track B: DSSE attestation
- Track C: CLI enhancements
---
## Interlocks
- Depends on drift detection API from Sprint 3600.3
- Must align with existing Console design patterns
- Must use existing Attestor infrastructure
---
## Action Tracker
| Date (UTC) | Action | Owner | Notes |
|---|---|---|---|
| 2025-12-17 | Created sprint from master plan | Agent | Initial |
---
## 1. OBJECTIVE
Build the user-facing components:
1. **Path Viewer** - Interactive call path visualization
2. **Risk Drift Card** - Summary view for PRs/scans
3. **Evidence Chain** - DSSE attestation linking
4. **CLI Output** - Enhanced drift reporting
---
## 2. TECHNICAL DESIGN
### 2.1 Angular Path Viewer Component
```typescript
// File: src/Web/StellaOps.Web/src/app/components/path-viewer/path-viewer.component.ts
import { Component, Input, Output, EventEmitter } from '@angular/core';
import { CommonModule } from '@angular/common';
export interface PathNode {
nodeId: string;
symbol: string;
file?: string;
line?: number;
package?: string;
isChanged: boolean;
changeKind?: string;
}
export interface CompressedPath {
entrypoint: PathNode;
sink: PathNode;
intermediateCount: number;
keyNodes: PathNode[];
fullPath?: string[];
}
@Component({
selector: 'app-path-viewer',
standalone: true,
imports: [CommonModule],
template: `
<div class="path-viewer">
<div class="path-header">
<span class="path-title">{{ title }}</span>
<button
*ngIf="collapsible"
class="btn-collapse"
(click)="toggleCollapse()">
{{ collapsed ? 'Expand' : 'Collapse' }}
</button>
</div>
<div class="path-content" *ngIf="!collapsed">
<!-- Entrypoint -->
<div class="path-node entrypoint">
<span class="node-icon">○</span>
<div class="node-details">
<span class="node-symbol">{{ path.entrypoint.symbol }}</span>
<span class="node-location" *ngIf="path.entrypoint.file">
{{ path.entrypoint.file }}:{{ path.entrypoint.line }}
</span>
<span class="node-badge entrypoint-badge">ENTRYPOINT</span>
</div>
</div>
<!-- Connector -->
<div class="path-connector"></div>
<!-- Key intermediate nodes -->
<ng-container *ngFor="let node of path.keyNodes; let i = index">
<div
class="path-node"
[class.changed]="node.isChanged">
<span class="node-icon" [class.changed-icon]="node.isChanged">
{{ node.isChanged ? '●' : '○' }}
</span>
<div class="node-details">
<span class="node-symbol">{{ node.symbol }}</span>
<span class="node-location" *ngIf="node.file">
{{ node.file }}:{{ node.line }}
</span>
<span
class="node-badge change-badge"
*ngIf="node.isChanged">
{{ formatChangeKind(node.changeKind) }}
</span>
</div>
</div>
<div class="path-connector"></div>
</ng-container>
<!-- Collapsed indicator -->
<div
class="path-collapsed-indicator"
*ngIf="path.intermediateCount > path.keyNodes.length">
<span>... {{ path.intermediateCount - path.keyNodes.length }} more nodes ...</span>
<button class="btn-expand" (click)="requestFullPath()">
Show full path
</button>
</div>
<!-- Sink -->
<div class="path-node sink">
<span class="node-icon sink-icon">◆</span>
<div class="node-details">
<span class="node-symbol">{{ path.sink.symbol }}</span>
<span class="node-location" *ngIf="path.sink.file">
{{ path.sink.file }}:{{ path.sink.line }}
</span>
<span class="node-badge sink-badge">VULNERABLE SINK</span>
<span class="node-badge package-badge" *ngIf="path.sink.package">
{{ path.sink.package }}
</span>
</div>
</div>
</div>
<div class="path-legend" *ngIf="showLegend && !collapsed">
<span><span class="legend-icon">○</span> Node</span>
<span><span class="legend-icon changed-icon">●</span> Changed</span>
<span><span class="legend-icon sink-icon">◆</span> Sink</span>
<span><span class="legend-line">─</span> Call</span>
</div>
</div>
`,
styleUrls: ['./path-viewer.component.scss']
})
export class PathViewerComponent {
@Input() path!: CompressedPath;
@Input() title = 'Call Path';
@Input() collapsible = true;
@Input() showLegend = true;
@Input() collapsed = false;
@Output() expandPath = new EventEmitter<string[]>();
toggleCollapse(): void {
this.collapsed = !this.collapsed;
}
requestFullPath(): void {
if (this.path.fullPath) {
this.expandPath.emit(this.path.fullPath);
}
}
formatChangeKind(kind?: string): string {
if (!kind) return 'Changed';
return kind
.replace(/_/g, ' ')
.replace(/\b\w/g, c => c.toUpperCase());
}
}
```
### 2.2 Risk Drift Card Component
```typescript
// File: src/Web/StellaOps.Web/src/app/components/risk-drift-card/risk-drift-card.component.ts
import { Component, Input, Output, EventEmitter } from '@angular/core';
import { CommonModule } from '@angular/common';
import { PathViewerComponent, CompressedPath } from '../path-viewer/path-viewer.component';
export interface DriftedSink {
sinkNodeId: string;
symbol: string;
sinkCategory: string;
direction: 'became_reachable' | 'became_unreachable';
cause: DriftCause;
path: CompressedPath;
associatedVulns: AssociatedVuln[];
}
export interface DriftCause {
kind: string;
description: string;
changedSymbol?: string;
changedFile?: string;
changedLine?: number;
}
export interface AssociatedVuln {
cveId: string;
epss?: number;
cvss?: number;
vexStatus?: string;
packagePurl?: string;
}
export interface DriftResult {
baseScanId: string;
headScanId: string;
newlyReachable: DriftedSink[];
newlyUnreachable: DriftedSink[];
}
@Component({
selector: 'app-risk-drift-card',
standalone: true,
imports: [CommonModule, PathViewerComponent],
template: `
<div class="risk-drift-card">
<div class="card-header">
<h3>Risk Drift</h3>
<button class="btn-collapse" (click)="toggleExpand()">
{{ expanded ? '▲' : '▼' }}
</button>
</div>
<div class="card-summary">
<span class="badge new-reachable" *ngIf="result.newlyReachable.length > 0">
+{{ result.newlyReachable.length }} new reachable paths
</span>
<span class="badge mitigated" *ngIf="result.newlyUnreachable.length > 0">
-{{ result.newlyUnreachable.length }} mitigated paths
</span>
<span class="badge no-drift" *ngIf="!hasDrift">
No material drift detected
</span>
</div>
<div class="card-content" *ngIf="expanded">
<!-- Newly Reachable Section -->
<div class="drift-section new-reachable" *ngIf="result.newlyReachable.length > 0">
<h4>New Reachable Paths</h4>
<div
class="drifted-sink"
*ngFor="let sink of result.newlyReachable">
<div class="sink-header">
<span class="sink-route">
{{ formatRoute(sink) }}
</span>
<div class="sink-badges">
<span
class="vuln-badge"
*ngFor="let vuln of sink.associatedVulns">
{{ vuln.cveId }}
<span class="epss" *ngIf="vuln.epss">(EPSS {{ vuln.epss | number:'1.2-2' }})</span>
</span>
<span class="vex-badge" *ngIf="sink.associatedVulns[0]?.vexStatus">
VEX: {{ sink.associatedVulns[0].vexStatus }}
</span>
</div>
</div>
<div class="sink-cause">
<strong>Cause:</strong> {{ sink.cause.description }}
</div>
<app-path-viewer
[path]="sink.path"
[title]="''"
[showLegend]="false"
[collapsed]="true">
</app-path-viewer>
<div class="sink-actions">
<button class="btn-action" (click)="viewPath.emit(sink)">
View Path
</button>
<button class="btn-action" (click)="quarantine.emit(sink)">
Quarantine Route
</button>
<button class="btn-action" (click)="pinVersion.emit(sink)">
Pin Version
</button>
<button class="btn-action secondary" (click)="addException.emit(sink)">
Add Exception
</button>
</div>
</div>
</div>
<!-- Mitigated Section -->
<div class="drift-section mitigated" *ngIf="result.newlyUnreachable.length > 0">
<h4>Mitigated Paths</h4>
<div
class="drifted-sink mitigated"
*ngFor="let sink of result.newlyUnreachable">
<div class="sink-header">
<span class="sink-route">
{{ formatRoute(sink) }}
</span>
<div class="sink-badges">
<span
class="vuln-badge resolved"
*ngFor="let vuln of sink.associatedVulns">
{{ vuln.cveId }}
</span>
</div>
</div>
<div class="sink-cause">
<strong>Reason:</strong> {{ sink.cause.description }}
</div>
</div>
</div>
</div>
</div>
`,
styleUrls: ['./risk-drift-card.component.scss']
})
export class RiskDriftCardComponent {
@Input() result!: DriftResult;
@Input() expanded = true;
@Output() viewPath = new EventEmitter<DriftedSink>();
@Output() quarantine = new EventEmitter<DriftedSink>();
@Output() pinVersion = new EventEmitter<DriftedSink>();
@Output() addException = new EventEmitter<DriftedSink>();
get hasDrift(): boolean {
return this.result.newlyReachable.length > 0 ||
this.result.newlyUnreachable.length > 0;
}
toggleExpand(): void {
this.expanded = !this.expanded;
}
formatRoute(sink: DriftedSink): string {
const entrypoint = sink.path.entrypoint.symbol;
const sinkSymbol = sink.path.sink.symbol;
const intermediateCount = sink.path.intermediateCount;
if (intermediateCount <= 2) {
return `${entrypoint}${sinkSymbol}`;
}
return `${entrypoint} → ... → ${sinkSymbol}`;
}
}
```
### 2.3 DSSE Predicate for Drift
```csharp
// File: src/Attestor/StellaOps.Attestor.Types/Predicates/ReachabilityDriftPredicate.cs
namespace StellaOps.Attestor.Types.Predicates;
using System.Collections.Immutable;
using System.Text.Json.Serialization;
/// <summary>
/// DSSE predicate for reachability drift attestation.
/// predicateType: stellaops.dev/predicates/reachability-drift@v1
/// </summary>
public sealed record ReachabilityDriftPredicate
{
public const string PredicateType = "stellaops.dev/predicates/reachability-drift@v1";
[JsonPropertyName("baseImage")]
public required ImageReference BaseImage { get; init; }
[JsonPropertyName("targetImage")]
public required ImageReference TargetImage { get; init; }
[JsonPropertyName("baseScanId")]
public required string BaseScanId { get; init; }
[JsonPropertyName("headScanId")]
public required string HeadScanId { get; init; }
[JsonPropertyName("drift")]
public required DriftSummary Drift { get; init; }
[JsonPropertyName("analysis")]
public required AnalysisMetadata Analysis { get; init; }
}
public sealed record ImageReference
{
[JsonPropertyName("name")]
public required string Name { get; init; }
[JsonPropertyName("digest")]
public required string Digest { get; init; }
}
public sealed record DriftSummary
{
[JsonPropertyName("newlyReachableCount")]
public required int NewlyReachableCount { get; init; }
[JsonPropertyName("newlyUnreachableCount")]
public required int NewlyUnreachableCount { get; init; }
[JsonPropertyName("newlyReachable")]
public required ImmutableArray<DriftedSinkSummary> NewlyReachable { get; init; }
[JsonPropertyName("newlyUnreachable")]
public required ImmutableArray<DriftedSinkSummary> NewlyUnreachable { get; init; }
}
public sealed record DriftedSinkSummary
{
[JsonPropertyName("sinkNodeId")]
public required string SinkNodeId { get; init; }
[JsonPropertyName("symbol")]
public required string Symbol { get; init; }
[JsonPropertyName("sinkCategory")]
public required string SinkCategory { get; init; }
[JsonPropertyName("causeKind")]
public required string CauseKind { get; init; }
[JsonPropertyName("causeDescription")]
public required string CauseDescription { get; init; }
[JsonPropertyName("associatedCves")]
public ImmutableArray<string> AssociatedCves { get; init; } = [];
}
public sealed record AnalysisMetadata
{
[JsonPropertyName("analyzedAt")]
public required DateTimeOffset AnalyzedAt { get; init; }
[JsonPropertyName("scanner")]
public required ScannerInfo Scanner { get; init; }
[JsonPropertyName("baseGraphDigest")]
public required string BaseGraphDigest { get; init; }
[JsonPropertyName("headGraphDigest")]
public required string HeadGraphDigest { get; init; }
}
public sealed record ScannerInfo
{
[JsonPropertyName("name")]
public required string Name { get; init; }
[JsonPropertyName("version")]
public required string Version { get; init; }
[JsonPropertyName("ruleset")]
public string? Ruleset { get; init; }
}
```
### 2.4 CLI Output Enhancement
```csharp
// File: src/Cli/StellaOps.Cli/Commands/DriftCommand.cs
namespace StellaOps.Cli.Commands;
using System.CommandLine;
using System.Text.Json;
using Spectre.Console;
public class DriftCommand : Command
{
public DriftCommand() : base("drift", "Detect reachability drift between image versions")
{
var baseOption = new Option<string>("--base", "Base image reference") { IsRequired = true };
var targetOption = new Option<string>("--target", "Target image reference") { IsRequired = true };
var formatOption = new Option<string>("--format", () => "table", "Output format (table|json|sarif)");
var verboseOption = new Option<bool>("--verbose", () => false, "Show detailed path information");
AddOption(baseOption);
AddOption(targetOption);
AddOption(formatOption);
AddOption(verboseOption);
this.SetHandler(ExecuteAsync, baseOption, targetOption, formatOption, verboseOption);
}
private async Task ExecuteAsync(string baseImage, string targetImage, string format, bool verbose)
{
AnsiConsole.MarkupLine($"[bold]Analyzing drift:[/] {baseImage} → {targetImage}");
// TODO: Call drift detection service
var result = await DetectDriftAsync(baseImage, targetImage);
switch (format.ToLowerInvariant())
{
case "json":
OutputJson(result);
break;
case "sarif":
OutputSarif(result);
break;
default:
OutputTable(result, verbose);
break;
}
// Exit code based on drift
Environment.ExitCode = result.TotalDriftCount switch
{
0 => 0, // No drift
> 0 when result.NewlyReachable.Length > 0 => 1, // New reachable (info)
_ => 0 // Only mitigated
};
}
private void OutputTable(ReachabilityDriftResult result, bool verbose)
{
if (result.NewlyReachable.Length > 0)
{
AnsiConsole.MarkupLine("\n[red bold]NEW REACHABLE PATHS[/]");
var table = new Table();
table.AddColumn("Sink");
table.AddColumn("Category");
table.AddColumn("Cause");
if (verbose)
{
table.AddColumn("Path");
}
table.AddColumn("CVEs");
foreach (var sink in result.NewlyReachable)
{
var row = new List<string>
{
sink.Symbol,
sink.SinkCategory.ToString(),
sink.Cause.Description
};
if (verbose)
{
row.Add($"{sink.Path.Entrypoint.Symbol} → ... → {sink.Path.Sink.Symbol}");
}
row.Add(string.Join(", ", sink.AssociatedVulns.Select(v => v.CveId)));
table.AddRow(row.ToArray());
}
AnsiConsole.Write(table);
}
if (result.NewlyUnreachable.Length > 0)
{
AnsiConsole.MarkupLine("\n[green bold]MITIGATED PATHS[/]");
var table = new Table();
table.AddColumn("Sink");
table.AddColumn("Category");
table.AddColumn("Reason");
foreach (var sink in result.NewlyUnreachable)
{
table.AddRow(
sink.Symbol,
sink.SinkCategory.ToString(),
sink.Cause.Description);
}
AnsiConsole.Write(table);
}
if (result.TotalDriftCount == 0)
{
AnsiConsole.MarkupLine("\n[green]No material reachability drift detected.[/]");
}
AnsiConsole.MarkupLine($"\n[bold]Summary:[/] +{result.NewlyReachable.Length} reachable, -{result.NewlyUnreachable.Length} mitigated");
}
private void OutputJson(ReachabilityDriftResult result)
{
var json = JsonSerializer.Serialize(result, new JsonSerializerOptions
{
WriteIndented = true,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
});
Console.WriteLine(json);
}
private void OutputSarif(ReachabilityDriftResult result)
{
// Generate SARIF 2.1.0 output
// TODO: Implement SARIF generation
throw new NotImplementedException("SARIF output to be implemented");
}
private Task<ReachabilityDriftResult> DetectDriftAsync(string baseImage, string targetImage)
{
// TODO: Implement actual drift detection
throw new NotImplementedException();
}
}
```
### 2.5 SARIF Integration
```csharp
// File: src/Scanner/__Libraries/StellaOps.Scanner.ReachabilityDrift/Output/DriftSarifGenerator.cs
namespace StellaOps.Scanner.ReachabilityDrift.Output;
using System.Text.Json;
/// <summary>
/// Generates SARIF 2.1.0 output for drift results.
/// </summary>
public sealed class DriftSarifGenerator
{
private const string ToolName = "StellaOps.ReachabilityDrift";
private const string ToolVersion = "1.0.0";
public JsonDocument Generate(ReachabilityDriftResult result)
{
var rules = new List<object>();
var results = new List<object>();
// Add rules for each drift type
rules.Add(new
{
id = "RDRIFT001",
name = "NewlyReachableSink",
shortDescription = new { text = "Vulnerable sink became reachable" },
fullDescription = new { text = "A vulnerable code sink became reachable from application entrypoints due to code changes." },
defaultConfiguration = new { level = "error" }
});
rules.Add(new
{
id = "RDRIFT002",
name = "MitigatedSink",
shortDescription = new { text = "Vulnerable sink became unreachable" },
fullDescription = new { text = "A vulnerable code sink is no longer reachable from application entrypoints." },
defaultConfiguration = new { level = "note" }
});
// Add results for newly reachable sinks
foreach (var sink in result.NewlyReachable)
{
results.Add(new
{
ruleId = "RDRIFT001",
level = "error",
message = new
{
text = $"Sink {sink.Symbol} became reachable. Cause: {sink.Cause.Description}"
},
locations = sink.Cause.ChangedFile is not null ? new[]
{
new
{
physicalLocation = new
{
artifactLocation = new { uri = sink.Cause.ChangedFile },
region = new { startLine = sink.Cause.ChangedLine ?? 1 }
}
}
} : Array.Empty<object>(),
properties = new
{
sinkCategory = sink.SinkCategory.ToString(),
causeKind = sink.Cause.Kind.ToString(),
associatedVulns = sink.AssociatedVulns.Select(v => v.CveId).ToArray()
}
});
}
// Add results for mitigated sinks
foreach (var sink in result.NewlyUnreachable)
{
results.Add(new
{
ruleId = "RDRIFT002",
level = "note",
message = new
{
text = $"Sink {sink.Symbol} is no longer reachable. Reason: {sink.Cause.Description}"
},
properties = new
{
sinkCategory = sink.SinkCategory.ToString(),
causeKind = sink.Cause.Kind.ToString()
}
});
}
var sarif = new
{
version = "2.1.0",
schema = "https://json.schemastore.org/sarif-2.1.0.json",
runs = new[]
{
new
{
tool = new
{
driver = new
{
name = ToolName,
version = ToolVersion,
informationUri = "https://stellaops.dev/docs/reachability-drift",
rules = rules.ToArray()
}
},
results = results.ToArray(),
invocations = new[]
{
new
{
executionSuccessful = true,
endTimeUtc = result.DetectedAt.UtcDateTime.ToString("o")
}
}
}
}
};
var json = JsonSerializer.Serialize(sarif, new JsonSerializerOptions
{
WriteIndented = true,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
});
return JsonDocument.Parse(json);
}
}
```
---
## Delivery Tracker
| # | Task ID | Status | Description | Notes |
|---|---------|--------|-------------|-------|
| 1 | UI-001 | TODO | Create PathNode TypeScript interface | Angular model |
| 2 | UI-002 | TODO | Create CompressedPath TypeScript interface | Angular model |
| 3 | UI-003 | TODO | Create PathViewerComponent | Core visualization |
| 4 | UI-004 | TODO | Style PathViewerComponent | SCSS styling |
| 5 | UI-005 | TODO | Create DriftedSink TypeScript interface | Angular model |
| 6 | UI-006 | TODO | Create DriftResult TypeScript interface | Angular model |
| 7 | UI-007 | TODO | Create RiskDriftCardComponent | Summary card |
| 8 | UI-008 | TODO | Style RiskDriftCardComponent | SCSS styling |
| 9 | UI-009 | TODO | Create drift API service | Angular HTTP service |
| 10 | UI-010 | TODO | Integrate PathViewer into scan details | Page integration |
| 11 | UI-011 | TODO | Integrate RiskDriftCard into PR view | Page integration |
| 12 | UI-012 | TODO | Unit tests for PathViewerComponent | Jest tests |
| 13 | UI-013 | TODO | Unit tests for RiskDriftCardComponent | Jest tests |
| 14 | UI-014 | TODO | Create ReachabilityDriftPredicate model | DSSE predicate |
| 15 | UI-015 | TODO | Register predicate in Attestor | Type registration |
| 16 | UI-016 | TODO | Implement drift attestation service | DSSE signing |
| 17 | UI-017 | TODO | Add attestation to drift API | API integration |
| 18 | UI-018 | TODO | Unit tests for attestation | Predicate validation |
| 19 | UI-019 | TODO | Create DriftCommand for CLI | CLI command |
| 20 | UI-020 | TODO | Implement table output | Spectre.Console |
| 21 | UI-021 | TODO | Implement JSON output | JSON serialization |
| 22 | UI-022 | TODO | Create DriftSarifGenerator | SARIF 2.1.0 |
| 23 | UI-023 | TODO | Implement SARIF output for CLI | CLI integration |
| 24 | UI-024 | TODO | Update CLI documentation | docs/cli/ |
| 25 | UI-025 | TODO | Integration tests for CLI | End-to-end |
---
## 3. ACCEPTANCE CRITERIA
### 3.1 Path Viewer Component
- [ ] Displays entrypoint and sink nodes
- [ ] Shows key intermediate nodes
- [ ] Highlights changed nodes
- [ ] Supports collapse/expand
- [ ] Shows legend
- [ ] Handles paths of various lengths
### 3.2 Risk Drift Card Component
- [ ] Shows summary badges
- [ ] Lists newly reachable paths
- [ ] Lists mitigated paths
- [ ] Shows associated vulnerabilities
- [ ] Provides action buttons
- [ ] Supports expand/collapse
### 3.3 DSSE Attestation
- [ ] Generates valid predicate
- [ ] Signs with DSSE envelope
- [ ] Includes graph digests
- [ ] Includes all drift details
- [ ] Passes schema validation
### 3.4 CLI Output
- [ ] Table output is readable
- [ ] JSON output is valid
- [ ] SARIF output passes schema validation
- [ ] Exit codes are correct
- [ ] Verbose mode shows paths
---
## Decisions & Risks
| ID | Decision | Rationale |
|----|----------|-----------|
| UI-DEC-001 | Standalone Angular components | Reusability across pages |
| UI-DEC-002 | SARIF rule IDs prefixed with RDRIFT | Distinguish from other SARIF sources |
| UI-DEC-003 | CLI uses Spectre.Console | Consistent with existing CLI style |
| ID | Risk | Mitigation |
|----|------|------------|
| UI-RISK-001 | Large paths slow UI | Lazy loading, pagination |
| UI-RISK-002 | SARIF compatibility issues | Test against multiple consumers |
| UI-RISK-003 | Attestation size limits | Summary only, link to full data |
---
## Execution Log
| Date (UTC) | Update | Owner |
|---|---|---|
| 2025-12-17 | Created sprint from master plan | Agent |
---
## References
- **Master Sprint**: `SPRINT_3600_0001_0001_reachability_drift_master.md`
- **Drift Detection Sprint**: `SPRINT_3600_0003_0001_drift_detection_engine.md`
- **Advisory**: `17-Dec-2025 - Reachability Drift Detection.md`
- **Angular Style Guide**: https://angular.io/guide/styleguide
- **SARIF 2.1.0 Spec**: https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html

View File

@@ -0,0 +1,241 @@
# SPRINT_3700_0001_0001_triage_db_schema
**Epic:** Triage Infrastructure
**Module:** Scanner
**Working Directory:** `src/Scanner/__Libraries/StellaOps.Scanner.Triage/`
**Status:** TODO
**Created:** 2025-12-17
**Target Completion:** TBD
**Depends On:** None
---
## 1. Overview
Implement the PostgreSQL database schema for the Narrative-First Triage UX system, including all tables, enums, indexes, and views required to support the triage workflow.
### 1.1 Deliverables
1. PostgreSQL migration script (`triage_schema.sql`)
2. EF Core entities for all triage tables
3. `TriageDbContext` with proper configuration
4. Integration tests using Testcontainers
5. Performance validation for indexed queries
### 1.2 Dependencies
- PostgreSQL >= 16
- EF Core 9.0
- `StellaOps.Infrastructure.Postgres` for base patterns
---
## 2. Delivery Tracker
| ID | Task | Owner | Status | Notes |
|----|------|-------|--------|-------|
| T1 | Create migration script from `docs/db/triage_schema.sql` | — | TODO | |
| T2 | Create PostgreSQL enums (7 types) | — | TODO | See schema |
| T3 | Create `TriageFinding` entity | — | TODO | |
| T4 | Create `TriageEffectiveVex` entity | — | TODO | |
| T5 | Create `TriageReachabilityResult` entity | — | TODO | |
| T6 | Create `TriageRiskResult` entity | — | TODO | |
| T7 | Create `TriageDecision` entity | — | TODO | |
| T8 | Create `TriageEvidenceArtifact` entity | — | TODO | |
| T9 | Create `TriageSnapshot` entity | — | TODO | |
| T10 | Create `TriageDbContext` with Fluent API | — | TODO | |
| T11 | Implement `v_triage_case_current` view mapping | — | TODO | |
| T12 | Add performance indexes | — | TODO | |
| T13 | Write integration tests with Testcontainers | — | TODO | |
| T14 | Validate query performance (explain analyze) | — | TODO | |
---
## 3. Task Details
### T1: Create migration script
**Location:** `src/Scanner/__Libraries/StellaOps.Scanner.Triage/Migrations/`
Use the schema from `docs/db/triage_schema.sql` as the authoritative source. Create an EF Core migration that matches.
### T2-T9: Entity Classes
Create entities in `src/Scanner/__Libraries/StellaOps.Scanner.Triage/Entities/`
```csharp
// Example structure
namespace StellaOps.Scanner.Triage.Entities;
public enum TriageLane
{
Active,
Blocked,
NeedsException,
MutedReach,
MutedVex,
Compensated
}
public enum TriageVerdict
{
Ship,
Block,
Exception
}
public sealed record TriageFinding
{
public Guid Id { get; init; }
public Guid AssetId { get; init; }
public Guid? EnvironmentId { get; init; }
public required string AssetLabel { get; init; }
public required string Purl { get; init; }
public string? CveId { get; init; }
public string? RuleId { get; init; }
public DateTimeOffset FirstSeenAt { get; init; }
public DateTimeOffset LastSeenAt { get; init; }
}
```
### T10: DbContext Configuration
```csharp
public sealed class TriageDbContext : DbContext
{
public DbSet<TriageFinding> Findings => Set<TriageFinding>();
public DbSet<TriageEffectiveVex> EffectiveVex => Set<TriageEffectiveVex>();
public DbSet<TriageReachabilityResult> ReachabilityResults => Set<TriageReachabilityResult>();
public DbSet<TriageRiskResult> RiskResults => Set<TriageRiskResult>();
public DbSet<TriageDecision> Decisions => Set<TriageDecision>();
public DbSet<TriageEvidenceArtifact> EvidenceArtifacts => Set<TriageEvidenceArtifact>();
public DbSet<TriageSnapshot> Snapshots => Set<TriageSnapshot>();
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
// Configure PostgreSQL enums
modelBuilder.HasPostgresEnum<TriageLane>("triage_lane");
modelBuilder.HasPostgresEnum<TriageVerdict>("triage_verdict");
// ... more enums
// Configure entities
modelBuilder.Entity<TriageFinding>(entity =>
{
entity.ToTable("triage_finding");
entity.HasKey(e => e.Id);
entity.HasIndex(e => e.LastSeenAt).IsDescending();
// ... more configuration
});
}
}
```
### T11: View Mapping
Map the `v_triage_case_current` view as a keyless entity:
```csharp
[Keyless]
public sealed record TriageCaseCurrent
{
public Guid CaseId { get; init; }
public Guid AssetId { get; init; }
// ... all view columns
}
// In DbContext
modelBuilder.Entity<TriageCaseCurrent>()
.ToView("v_triage_case_current")
.HasNoKey();
```
### T13: Integration Tests
```csharp
public class TriageSchemaTests : IAsyncLifetime
{
private readonly PostgreSqlContainer _postgres = new PostgreSqlBuilder()
.WithImage("postgres:16-alpine")
.Build();
[Fact]
public async Task Schema_Creates_Successfully()
{
await using var context = CreateContext();
await context.Database.EnsureCreatedAsync();
// Verify tables exist
var tables = await context.Database.SqlQuery<string>(
$"SELECT tablename FROM pg_tables WHERE schemaname = 'public'")
.ToListAsync();
Assert.Contains("triage_finding", tables);
Assert.Contains("triage_decision", tables);
// ... more assertions
}
[Fact]
public async Task View_Returns_Correct_Columns()
{
await using var context = CreateContext();
await context.Database.EnsureCreatedAsync();
// Insert test data
var finding = new TriageFinding { /* ... */ };
context.Findings.Add(finding);
await context.SaveChangesAsync();
// Query view
var cases = await context.Set<TriageCaseCurrent>().ToListAsync();
Assert.Single(cases);
}
}
```
---
## 4. Decisions & Risks
### 4.1 Decisions
| Decision | Rationale | Date |
|----------|-----------|------|
| Use PostgreSQL enums | Type safety, smaller storage | 2025-12-17 |
| Use `DISTINCT ON` in view | Efficient "latest" queries | 2025-12-17 |
| Store explanation as JSONB | Flexible schema for lattice output | 2025-12-17 |
### 4.2 Risks
| Risk | Impact | Mitigation |
|------|--------|------------|
| Enum changes require migration | Medium | Use versioned enums, add-only pattern |
| View performance on large datasets | High | Monitor, add materialized view if needed |
---
## 5. Acceptance Criteria (Sprint)
- [ ] All 8 tables created with correct constraints
- [ ] All 7 enums registered in PostgreSQL
- [ ] View `v_triage_case_current` returns correct data
- [ ] Indexes created and verified with EXPLAIN ANALYZE
- [ ] Integration tests pass with Testcontainers
- [ ] No circular dependencies in foreign keys
- [ ] Migration is idempotent (can run multiple times)
---
## 6. Execution Log
| Date | Update | Owner |
|------|--------|-------|
| 2025-12-17 | Sprint file created | Claude |
---
## 7. Reference Files
- Schema definition: `docs/db/triage_schema.sql`
- UX Guide: `docs/ux/TRIAGE_UX_GUIDE.md`
- API Contract: `docs/api/triage.contract.v1.md`
- Advisory: `docs/product-advisories/unprocessed/16-Dec-2025 - Reimagining Proof-Linked UX in Security Workflows.md`

View File

@@ -0,0 +1,395 @@
# Reachability Drift Detection
**Date**: 2025-12-17
**Status**: ANALYZED - Ready for Implementation Planning
**Related Advisories**:
- 14-Dec-2025 - Smart-Diff Technical Reference
- 14-Dec-2025 - Reachability Analysis Technical Reference
---
## 1. EXECUTIVE SUMMARY
This advisory proposes extending StellaOps' Smart-Diff capabilities to detect **reachability drift** - changes in whether vulnerable code paths are reachable from application entry points between container image versions.
**Core Insight**: Raw diffs don't equal risk. Most changed lines don't matter for exploitability. Reachability drift detection fuses **call-stack reachability graphs** with **Smart-Diff metadata** to flag only paths that went from **unreachable to reachable** (or vice-versa), tied to **SBOM components** and **VEX statements**.
---
## 2. GAP ANALYSIS vs EXISTING INFRASTRUCTURE
### 2.1 What Already Exists (Leverage Points)
| Component | Location | Status |
|-----------|----------|--------|
| `MaterialRiskChangeDetector` | `Scanner.SmartDiff.Detection` | DONE - R1-R4 rules |
| `VexCandidateEmitter` | `Scanner.SmartDiff.Detection` | DONE - Absent API detection |
| `ReachabilityGateBridge` | `Scanner.SmartDiff.Detection` | DONE - Lattice to 3-bit |
| `ReachabilitySignal` | `Signals.Contracts` | DONE - Call path model |
| `ReachabilityLatticeState` | `Signals.Contracts.Evidence` | DONE - 5-state enum |
| `CallPath`, `CallPathNode` | `Signals.Contracts.Evidence` | DONE - Path representation |
| `ReachabilityEvidenceChain` | `Signals.Contracts.Evidence` | DONE - Proof chain |
| `vex.graph_nodes/edges` | DB Schema | DONE - Graph storage |
| `scanner.risk_state_snapshots` | DB Schema | DONE - State storage |
| `scanner.material_risk_changes` | DB Schema | DONE - Change storage |
| `FnDriftCalculator` | `Scanner.Core.Drift` | DONE - Classification drift |
| `SarifOutputGenerator` | `Scanner.SmartDiff.Output` | DONE - CI output |
| Reachability Benchmark | `bench/reachability-benchmark/` | DONE - Ground truth cases |
| Language Analyzers | `Scanner.Analyzers.Lang.*` | PARTIAL - Package detection, limited call graph |
### 2.2 What's Missing (New Implementation Required)
| Component | Advisory Ref | Gap Description |
|-----------|-------------|-----------------|
| **Call Graph Extractor (.NET)** | §7 C# Roslyn | No MSBuildWorkspace/Roslyn analysis exists |
| **Call Graph Extractor (Go)** | §7 Go SSA | No golang.org/x/tools/go/ssa integration |
| **Call Graph Extractor (Java)** | §7 | No Soot/WALA integration |
| **Call Graph Extractor (Node)** | §7 | No @babel/traverse integration |
| **`scanner.code_changes` table** | §4 Smart-Diff | AST-level diff facts not persisted |
| **Drift Cause Explainer** | §6 Timeline | No causal attribution on path nodes |
| **Path Viewer UI** | §UX | No Angular component for call path visualization |
| **Cross-scan Function-level Drift** | §6 | State drift exists, function-level doesn't |
| **Entrypoint Discovery (per-framework)** | §3 | Limited beyond package.json/manifest parsing |
### 2.3 Terminology Mapping
| Advisory Term | StellaOps Equivalent | Notes |
|--------------|---------------------|-------|
| `commit_sha` | `scan_id` | StellaOps is image-centric, not commit-centric |
| `call_node` | `vex.graph_nodes` | Existing schema, extend don't duplicate |
| `call_edge` | `vex.graph_edges` | Existing schema |
| `reachability_drift` | `scanner.material_risk_changes` | Add `cause`, `path_nodes` columns |
| Risk Drift | Material Risk Change | Existing term is more precise |
| Router, Signals | Signals module only | Router module is not implemented |
---
## 3. RECOMMENDED IMPLEMENTATION PATH
### 3.1 What to Ship (Delta from Current State)
```
NEW TABLES:
├── scanner.code_changes # AST-level diff facts
└── scanner.call_graph_snapshots # Per-scan call graph cache
NEW COLUMNS:
├── scanner.material_risk_changes.cause # TEXT - "guard_removed", "new_route", etc.
├── scanner.material_risk_changes.path_nodes # JSONB - Compressed path representation
└── scanner.material_risk_changes.base_scan_id # UUID - For cross-scan comparison
NEW SERVICES:
├── CallGraphExtractor.DotNet # Roslyn-based for .NET projects
├── CallGraphExtractor.Node # AST-based for Node.js
├── DriftCauseExplainer # Attribute causes to code changes
└── PathCompressor # Compress paths for storage/UI
NEW UI:
└── PathViewerComponent # Angular component for call path visualization
```
### 3.2 What NOT to Ship (Avoid Duplication)
- **Don't create `call_node`/`call_edge` tables** - Use existing `vex.graph_nodes`/`vex.graph_edges`
- **Don't add `commit_sha` columns** - Use `scan_id` consistently
- **Don't build React components** - Angular v17 is the stack
### 3.3 Use Valkey for Graph Caching
Valkey is already integrated in `Router.Gateway.RateLimit`. Use it for:
- **Call graph snapshot caching** - Fast cross-instance lookups
- **Reachability result caching** - Avoid recomputation
- **Key pattern**: `stella:callgraph:{scan_id}:{lang}:{digest}`
```yaml
# Configuration pattern (align with existing Router rate limiting)
reachability:
valkey_connection: "localhost:6379"
valkey_bucket: "stella-reachability"
cache_ttl_hours: 24
circuit_breaker:
failure_threshold: 5
timeout_seconds: 30
```
---
## 4. TECHNICAL DESIGN
### 4.1 Call Graph Extraction Model
```csharp
/// <summary>
/// Per-scan call graph snapshot for drift comparison.
/// </summary>
public sealed record CallGraphSnapshot
{
public required string ScanId { get; init; }
public required string GraphDigest { get; init; } // Content hash
public required string Language { get; init; }
public required DateTimeOffset ExtractedAt { get; init; }
public required ImmutableArray<CallGraphNode> Nodes { get; init; }
public required ImmutableArray<CallGraphEdge> Edges { get; init; }
public required ImmutableArray<string> EntrypointIds { get; init; }
}
public sealed record CallGraphNode
{
public required string NodeId { get; init; } // Stable identifier
public required string Symbol { get; init; } // Fully qualified name
public required string File { get; init; }
public required int Line { get; init; }
public required string Package { get; init; }
public required string Visibility { get; init; } // public/internal/private
public required bool IsEntrypoint { get; init; }
public required bool IsSink { get; init; }
public string? SinkCategory { get; init; } // CMD_EXEC, SQL_RAW, etc.
}
public sealed record CallGraphEdge
{
public required string SourceId { get; init; }
public required string TargetId { get; init; }
public required string CallKind { get; init; } // direct/virtual/delegate
}
```
### 4.2 Code Change Facts Model
```csharp
/// <summary>
/// AST-level code change facts from Smart-Diff.
/// </summary>
public sealed record CodeChangeFact
{
public required string ScanId { get; init; }
public required string File { get; init; }
public required string Symbol { get; init; }
public required CodeChangeKind Kind { get; init; }
public required JsonDocument Details { get; init; }
}
public enum CodeChangeKind
{
Added,
Removed,
SignatureChanged,
GuardChanged, // Boolean condition around call modified
DependencyChanged, // Callee package/version changed
VisibilityChanged // public<->internal<->private
}
```
### 4.3 Drift Cause Attribution
```csharp
/// <summary>
/// Explains why a reachability flip occurred.
/// </summary>
public sealed class DriftCauseExplainer
{
public DriftCause Explain(
CallGraphSnapshot baseGraph,
CallGraphSnapshot headGraph,
string sinkSymbol,
IReadOnlyList<CodeChangeFact> codeChanges)
{
// Find shortest path to sink in head graph
var path = ShortestPath(headGraph.EntrypointIds, sinkSymbol, headGraph);
if (path is null)
return DriftCause.Unknown;
// Check each node on path for code changes
foreach (var nodeId in path.NodeIds)
{
var node = headGraph.Nodes.First(n => n.NodeId == nodeId);
var change = codeChanges.FirstOrDefault(c => c.Symbol == node.Symbol);
if (change is not null)
{
return change.Kind switch
{
CodeChangeKind.GuardChanged => DriftCause.GuardRemoved(node.Symbol, node.File, node.Line),
CodeChangeKind.Added => DriftCause.NewPublicRoute(node.Symbol),
CodeChangeKind.VisibilityChanged => DriftCause.VisibilityEscalated(node.Symbol),
CodeChangeKind.DependencyChanged => DriftCause.DepUpgraded(change.Details),
_ => DriftCause.CodeModified(node.Symbol)
};
}
}
return DriftCause.Unknown;
}
}
```
### 4.4 Database Schema Extensions
```sql
-- New table: Code change facts from AST-level Smart-Diff
CREATE TABLE scanner.code_changes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id TEXT NOT NULL,
file TEXT NOT NULL,
symbol TEXT NOT NULL,
change_kind TEXT NOT NULL, -- added|removed|signature|guard|dep|visibility
details JSONB,
detected_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT code_changes_unique UNIQUE (tenant_id, scan_id, file, symbol)
);
CREATE INDEX idx_code_changes_scan ON scanner.code_changes(scan_id);
CREATE INDEX idx_code_changes_symbol ON scanner.code_changes(symbol);
-- New table: Per-scan call graph snapshots (compressed)
CREATE TABLE scanner.call_graph_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id TEXT NOT NULL,
language TEXT NOT NULL,
graph_digest TEXT NOT NULL, -- Content hash for dedup
node_count INT NOT NULL,
edge_count INT NOT NULL,
entrypoint_count INT NOT NULL,
extracted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
cas_uri TEXT NOT NULL, -- Reference to CAS for full graph
CONSTRAINT call_graph_snapshots_unique UNIQUE (tenant_id, scan_id, language)
);
CREATE INDEX idx_call_graph_snapshots_digest ON scanner.call_graph_snapshots(graph_digest);
-- Extend existing material_risk_changes table
ALTER TABLE scanner.material_risk_changes
ADD COLUMN IF NOT EXISTS cause TEXT,
ADD COLUMN IF NOT EXISTS path_nodes JSONB,
ADD COLUMN IF NOT EXISTS base_scan_id TEXT;
CREATE INDEX IF NOT EXISTS idx_material_risk_changes_cause
ON scanner.material_risk_changes(cause) WHERE cause IS NOT NULL;
```
---
## 5. UI DESIGN
### 5.1 Risk Drift Card (PR/Commit View)
```
┌─────────────────────────────────────────────────────────────────────┐
│ RISK DRIFT ▼ │
├─────────────────────────────────────────────────────────────────────┤
│ +3 new reachable paths -2 mitigated paths │
│ │
│ ┌─ NEW REACHABLE ──────────────────────────────────────────────┐ │
│ │ POST /payments → PaymentsController.Capture → ... → │ │
│ │ crypto.Verify(legacy) │ │
│ │ │ │
│ │ [pkg:payments@1.8.2] [CVE-2024-1234] [EPSS 0.72] [VEX:affected]│ │
│ │ │ │
│ │ Cause: guard removed in AuthFilter.cs:42 │ │
│ │ │ │
│ │ [View Path] [Quarantine Route] [Pin Version] [Add Exception] │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ MITIGATED ──────────────────────────────────────────────────┐ │
│ │ GET /admin → AdminController.Execute → ... → cmd.Run │ │
│ │ │ │
│ │ [pkg:admin@2.0.0] [CVE-2024-5678] [VEX:not_affected] │ │
│ │ │ │
│ │ Reason: Vulnerable API removed in upgrade │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
### 5.2 Path Viewer Component
```
┌─────────────────────────────────────────────────────────────────────┐
│ CALL PATH: POST /payments → crypto.Verify(legacy) [Collapse] │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ○ POST /payments [ENTRYPOINT] │
│ │ PaymentsController.cs:45 │
│ │ │
│ ├──○ PaymentsController.Capture() │
│ │ │ PaymentsController.cs:89 │
│ │ │ │
│ │ ├──○ PaymentService.ProcessPayment() │
│ │ │ │ PaymentService.cs:156 │
│ │ │ │ │
│ │ │ ├──● CryptoHelper.Verify() ← GUARD REMOVED │
│ │ │ │ │ CryptoHelper.cs:42 [Changed: AuthFilter removed] │
│ │ │ │ │ │
│ │ │ │ └──◆ crypto.Verify(legacy) [VULNERABLE SINK] │
│ │ │ │ pkg:crypto@1.2.3 │
│ │ │ │ CVE-2024-1234 (CVSS 9.8) │
│ │
│ Legend: ○ Node ● Changed ◆ Sink ─ Call │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 6. POLICY INTEGRATION
### 6.1 CI Gate Behavior
```yaml
# Policy wiring for drift detection
smart_diff:
gates:
# Fail PR when new reachable paths to affected sinks
- condition: "delta_reachable > 0 AND vex_status IN ['affected', 'under_investigation']"
action: block
message: "New reachable paths to vulnerable sinks detected"
# Warn when new paths to any sink
- condition: "delta_reachable > 0"
action: warn
message: "New reachable paths detected - review recommended"
# Auto-mitigate when VEX confirms not_affected
- condition: "vex_status == 'not_affected' AND vex_justification IN ['component_not_present', 'fix_applied']"
action: allow
auto_mitigate: true
```
### 6.2 Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success, no material drift |
| 1 | Success, material drift found (info) |
| 2 | Success, hardening regression detected |
| 3 | Success, new KEV reachable |
| 10+ | Errors |
---
## 7. SPRINT STRUCTURE
### 7.1 Master Sprint: SPRINT_3600_0001_0001
**Topic**: Reachability Drift Detection
**Dependencies**: SPRINT_3500 (Smart-Diff) - COMPLETE
### 7.2 Sub-Sprints
| ID | Topic | Priority | Effort | Dependencies |
|----|-------|----------|--------|--------------|
| SPRINT_3600_0002_0001 | Call Graph Infrastructure | P0 | Large | Master |
| SPRINT_3600_0003_0001 | Drift Detection Engine | P0 | Medium | 3600.2 |
| SPRINT_3600_0004_0001 | UI and Evidence Chain | P1 | Medium | 3600.3 |
---
## 8. REFERENCES
- `docs/product-advisories/14-Dec-2025 - Smart-Diff Technical Reference.md`
- `docs/product-advisories/14-Dec-2025 - Reachability Analysis Technical Reference.md`
- `docs/implplan/SPRINT_3500_0001_0001_smart_diff_master.md`
- `docs/reachability/lattice.md`
- `bench/reachability-benchmark/README.md`

View File

@@ -0,0 +1,400 @@
# Stella Ops Triage UI Reducer Spec (Pure State + Explicit Commands)
## 0. Purpose
Define a deterministic, testable UI state machine for the triage UI.
- State transitions are pure functions.
- Side effects are emitted as explicit Commands.
- Enables UI "replay" for debugging (aligns with Stella's deterministic ethos).
Target stack: Angular 17 + TypeScript.
## 1. Core Concepts
- Action: user/system event (route change, button click, HTTP success).
- State: all data required to render triage surfaces.
- Command: side-effect request (HTTP, download, navigation).
Reducer signature:
```ts
type ReduceResult = { state: TriageState; cmd: Command };
function reduce(state: TriageState, action: Action): ReduceResult;
```
## 2. State Model
```ts
export type Lane =
| "ACTIVE"
| "BLOCKED"
| "NEEDS_EXCEPTION"
| "MUTED_REACH"
| "MUTED_VEX"
| "COMPENSATED";
export type Verdict = "SHIP" | "BLOCK" | "EXCEPTION";
export interface MutedCounts {
reach: number;
vex: number;
compensated: number;
}
export interface FindingRow {
id: string; // caseId == findingId
lane: Lane;
verdict: Verdict;
score: number;
reachable: "YES" | "NO" | "UNKNOWN";
vex: "affected" | "not_affected" | "under_investigation" | "unknown";
exploit: "YES" | "NO" | "UNKNOWN";
asset: string;
updatedAt: string; // ISO
}
export interface CaseHeader {
id: string;
verdict: Verdict;
lane: Lane;
score: number;
policyId: string;
policyVersion: string;
inputsHash: string;
why: string; // short narrative
chips: Array<{ key: string; label: string; value: string; evidenceIds?: string[] }>;
}
export type EvidenceType =
| "SBOM_SLICE"
| "VEX_DOC"
| "PROVENANCE"
| "CALLSTACK_SLICE"
| "REACHABILITY_PROOF"
| "REPLAY_MANIFEST"
| "POLICY"
| "SCAN_LOG"
| "OTHER";
export interface EvidenceItem {
id: string;
type: EvidenceType;
title: string;
issuer?: string;
signed: boolean;
signedBy?: string;
contentHash: string;
createdAt: string;
previewUrl?: string;
rawUrl: string;
}
export type DecisionKind = "MUTE_REACH" | "MUTE_VEX" | "ACK" | "EXCEPTION";
export interface DecisionItem {
id: string;
kind: DecisionKind;
reasonCode: string;
note?: string;
ttl?: string;
actor: { subject: string; display?: string };
createdAt: string;
revokedAt?: string;
signatureRef?: string;
}
export type SnapshotTrigger =
| "FEED_UPDATE"
| "VEX_UPDATE"
| "SBOM_UPDATE"
| "RUNTIME_TRACE"
| "POLICY_UPDATE"
| "DECISION"
| "RESCAN";
export interface SnapshotItem {
id: string;
trigger: SnapshotTrigger;
changedAt: string;
fromInputsHash: string;
toInputsHash: string;
summary: string;
}
export interface SmartDiff {
fromInputsHash: string;
toInputsHash: string;
inputsChanged: Array<{ key: string; before?: string; after?: string; evidenceIds?: string[] }>;
outputsChanged: Array<{ key: string; before?: string; after?: string; evidenceIds?: string[] }>;
}
export interface TriageState {
route: { page: "TABLE" | "CASE"; caseId?: string };
filters: {
showMuted: boolean;
lane?: Lane;
search?: string;
page: number;
pageSize: number;
};
table: {
loading: boolean;
rows: FindingRow[];
mutedCounts?: MutedCounts;
error?: string;
etag?: string;
};
caseView: {
loading: boolean;
header?: CaseHeader;
evidenceLoading: boolean;
evidence?: EvidenceItem[];
decisionsLoading: boolean;
decisions?: DecisionItem[];
snapshotsLoading: boolean;
snapshots?: SnapshotItem[];
diffLoading: boolean;
activeDiff?: SmartDiff;
error?: string;
etag?: string;
};
ui: {
decisionDrawerOpen: boolean;
diffPanelOpen: boolean;
toast?: { kind: "success" | "error" | "info"; message: string };
};
}
```
## 3. Commands
```ts
export type Command =
| { type: "NONE" }
| { type: "HTTP_GET"; url: string; headers?: Record<string, string>; onSuccess: Action; onError: Action }
| { type: "HTTP_POST"; url: string; body: unknown; headers?: Record<string, string>; onSuccess: Action; onError: Action }
| { type: "HTTP_DELETE"; url: string; headers?: Record<string, string>; onSuccess: Action; onError: Action }
| { type: "DOWNLOAD"; url: string }
| { type: "NAVIGATE"; route: TriageState["route"] };
```
## 4. Actions
```ts
export type Action =
// routing
| { type: "ROUTE_TABLE" }
| { type: "ROUTE_CASE"; caseId: string }
// table
| { type: "TABLE_LOAD" }
| { type: "TABLE_LOAD_OK"; rows: FindingRow[]; mutedCounts: MutedCounts; etag?: string }
| { type: "TABLE_LOAD_ERR"; error: string }
| { type: "FILTER_SET_SEARCH"; search?: string }
| { type: "FILTER_SET_LANE"; lane?: Lane }
| { type: "FILTER_TOGGLE_SHOW_MUTED" }
| { type: "FILTER_SET_PAGE"; page: number }
| { type: "FILTER_SET_PAGE_SIZE"; pageSize: number }
// case header
| { type: "CASE_LOAD"; caseId: string }
| { type: "CASE_LOAD_OK"; header: CaseHeader; etag?: string }
| { type: "CASE_LOAD_ERR"; error: string }
// evidence
| { type: "EVIDENCE_LOAD"; caseId: string }
| { type: "EVIDENCE_LOAD_OK"; evidence: EvidenceItem[] }
| { type: "EVIDENCE_LOAD_ERR"; error: string }
// decisions
| { type: "DECISIONS_LOAD"; caseId: string }
| { type: "DECISIONS_LOAD_OK"; decisions: DecisionItem[] }
| { type: "DECISIONS_LOAD_ERR"; error: string }
| { type: "DECISION_DRAWER_OPEN"; open: boolean }
| { type: "DECISION_CREATE"; caseId: string; kind: DecisionKind; reasonCode: string; note?: string; ttl?: string }
| { type: "DECISION_CREATE_OK"; decision: DecisionItem }
| { type: "DECISION_CREATE_ERR"; error: string }
| { type: "DECISION_REVOKE"; caseId: string; decisionId: string }
| { type: "DECISION_REVOKE_OK"; decisionId: string }
| { type: "DECISION_REVOKE_ERR"; error: string }
// snapshots + smart diff
| { type: "SNAPSHOTS_LOAD"; caseId: string }
| { type: "SNAPSHOTS_LOAD_OK"; snapshots: SnapshotItem[] }
| { type: "SNAPSHOTS_LOAD_ERR"; error: string }
| { type: "DIFF_OPEN"; open: boolean }
| { type: "DIFF_LOAD"; caseId: string; fromInputsHash: string; toInputsHash: string }
| { type: "DIFF_LOAD_OK"; diff: SmartDiff }
| { type: "DIFF_LOAD_ERR"; error: string }
// export bundle
| { type: "BUNDLE_EXPORT"; caseId: string }
| { type: "BUNDLE_EXPORT_OK"; downloadUrl: string }
| { type: "BUNDLE_EXPORT_ERR"; error: string };
```
## 5. Reducer Invariants
* Pure: no I/O in reducer.
* Any mutation of gating/visibility must originate from:
* `CASE_LOAD_OK` (new computed risk)
* `DECISION_CREATE_OK` / `DECISION_REVOKE_OK`
* Evidence is loaded lazily; header is loaded first.
* "Show muted" affects only table filtering, never deletes data.
## 6. Reducer Implementation (Reference)
```ts
export function reduce(state: TriageState, action: Action): { state: TriageState; cmd: Command } {
switch (action.type) {
case "ROUTE_TABLE":
return {
state: { ...state, route: { page: "TABLE" } },
cmd: { type: "NAVIGATE", route: { page: "TABLE" } }
};
case "ROUTE_CASE":
return {
state: {
...state,
route: { page: "CASE", caseId: action.caseId },
caseView: { ...state.caseView, loading: true, error: undefined }
},
cmd: {
type: "HTTP_GET",
url: `/api/triage/v1/cases/${encodeURIComponent(action.caseId)}`,
headers: state.caseView.etag ? { "If-None-Match": state.caseView.etag } : undefined,
onSuccess: { type: "CASE_LOAD_OK", header: undefined as any },
onError: { type: "CASE_LOAD_ERR", error: "" }
}
};
case "TABLE_LOAD":
return {
state: { ...state, table: { ...state.table, loading: true, error: undefined } },
cmd: {
type: "HTTP_GET",
url: `/api/triage/v1/findings?showMuted=${state.filters.showMuted}&page=${state.filters.page}&pageSize=${state.filters.pageSize}`
+ (state.filters.lane ? `&lane=${state.filters.lane}` : "")
+ (state.filters.search ? `&search=${encodeURIComponent(state.filters.search)}` : ""),
headers: state.table.etag ? { "If-None-Match": state.table.etag } : undefined,
onSuccess: { type: "TABLE_LOAD_OK", rows: [], mutedCounts: { reach: 0, vex: 0, compensated: 0 } },
onError: { type: "TABLE_LOAD_ERR", error: "" }
}
};
case "TABLE_LOAD_OK":
return {
state: { ...state, table: { ...state.table, loading: false, rows: action.rows, mutedCounts: action.mutedCounts, etag: action.etag } },
cmd: { type: "NONE" }
};
case "TABLE_LOAD_ERR":
return {
state: { ...state, table: { ...state.table, loading: false, error: action.error } },
cmd: { type: "NONE" }
};
case "CASE_LOAD_OK": {
const header = action.header;
return {
state: {
...state,
caseView: {
...state.caseView,
loading: false,
header,
etag: action.etag,
evidenceLoading: true,
decisionsLoading: true,
snapshotsLoading: true
}
},
cmd: {
type: "HTTP_GET",
url: `/api/triage/v1/cases/${encodeURIComponent(header.id)}/evidence`,
onSuccess: { type: "EVIDENCE_LOAD_OK", evidence: [] },
onError: { type: "EVIDENCE_LOAD_ERR", error: "" }
}
};
}
case "EVIDENCE_LOAD_OK":
return {
state: { ...state, caseView: { ...state.caseView, evidenceLoading: false, evidence: action.evidence } },
cmd: { type: "NONE" }
};
case "DECISION_DRAWER_OPEN":
return { state: { ...state, ui: { ...state.ui, decisionDrawerOpen: action.open } }, cmd: { type: "NONE" } };
case "DECISION_CREATE":
return {
state: state,
cmd: {
type: "HTTP_POST",
url: `/api/triage/v1/decisions`,
body: { caseId: action.caseId, kind: action.kind, reasonCode: action.reasonCode, note: action.note, ttl: action.ttl },
onSuccess: { type: "DECISION_CREATE_OK", decision: undefined as any },
onError: { type: "DECISION_CREATE_ERR", error: "" }
}
};
case "DECISION_CREATE_OK":
return {
state: {
...state,
ui: { ...state.ui, decisionDrawerOpen: false, toast: { kind: "success", message: "Decision applied. Undo available in History." } }
},
// after decision, refresh header + snapshots (re-compute may occur server-side)
cmd: { type: "HTTP_GET", url: `/api/triage/v1/cases/${encodeURIComponent(state.route.caseId!)}`, onSuccess: { type: "CASE_LOAD_OK", header: undefined as any }, onError: { type: "CASE_LOAD_ERR", error: "" } }
};
case "BUNDLE_EXPORT":
return {
state,
cmd: {
type: "HTTP_POST",
url: `/api/triage/v1/cases/${encodeURIComponent(action.caseId)}/export`,
body: {},
onSuccess: { type: "BUNDLE_EXPORT_OK", downloadUrl: "" },
onError: { type: "BUNDLE_EXPORT_ERR", error: "" }
}
};
case "BUNDLE_EXPORT_OK":
return {
state: { ...state, ui: { ...state.ui, toast: { kind: "success", message: "Evidence bundle ready." } } },
cmd: { type: "DOWNLOAD", url: action.downloadUrl }
};
default:
return { state, cmd: { type: "NONE" } };
}
}
```
## 7. Unit Testing Requirements
Minimum tests:
* Reducer purity: no global mutation.
* TABLE_LOAD produces correct URL for filters.
* ROUTE_CASE triggers case header load.
* CASE_LOAD_OK triggers EVIDENCE load (and separately decisions/snapshots in your integration layer).
* DECISION_CREATE_OK closes drawer and refreshes case header.
* BUNDLE_EXPORT_OK emits DOWNLOAD.
Recommended: golden-state snapshots to ensure backwards compatibility when the state model evolves.
---
**Document Version**: 1.0
**Target Platform**: Angular v17 + TypeScript

236
docs/ux/TRIAGE_UX_GUIDE.md Normal file
View File

@@ -0,0 +1,236 @@
# Stella Ops Triage UX Guide (Narrative-First + Proof-Linked)
## 0. Scope
This guide specifies the user experience for Stella Ops triage and evidence workflows:
- Narrative-first case view that answers DevOps' three questions quickly.
- Proof-linked evidence surfaces (SBOM/VEX/provenance/reachability/replay).
- Quiet-by-default noise controls with reversible, signed decisions.
- Smart-Diff history that explains meaningful risk changes.
Architecture constraints:
- Lattice/risk evaluation executes in `scanner.webservice`.
- `concelier` and `excititor` must **preserve prune source** (every merged/pruned datum remains traceable to origin).
## 1. UX Contract
Every triage surface must answer, in order:
1) Can I ship this?
2) If not, what exactly blocks me?
3) What's the minimum safe change to unblock?
Everything else is secondary and should be progressively disclosed.
## 2. Primary Objects in the UX
- Finding/Case: a specific vuln/rule tied to an asset (image/artifact/environment).
- Risk Result: deterministic lattice output (score/verdict/lane), computed by `scanner.webservice`.
- Evidence Artifact: signed, hash-addressed proof objects (SBOM slice, VEX doc, provenance, reachability slice, replay manifest).
- Decision: reversible user/system action that changes visibility/gating (mute/ack/exception) and is always signed/auditable.
- Snapshot: immutable record of inputs/outputs hashes enabling Smart-Diff.
## 3. Global UX Principles
### 3.1 Narrative-first, list-second
Default view is a "Case" narrative header + evidence rail. Lists exist for scanning and sorting, but not as the primary cognitive surface.
### 3.2 Time-to-evidence (TTFS) target
From pipeline alert click → human-readable verdict + first evidence link:
- p95 ≤ 30 seconds (including auth and initial fetch).
- "Evidence" is always one click away (no deep tab chains).
### 3.3 Proof-linking is mandatory
Any chip/badge that asserts a fact must link to the exact evidence object(s) that justify it.
Examples:
- "Reachable: Yes" → call-stack slice (and/or runtime hit record)
- "VEX: not_affected" → effective VEX assertion + signature details
- "Blocked by Policy Gate X" → policy artifact + lattice explanation
### 3.4 Quiet by default, never silent
Muted lanes are hidden by default but surfaced with counts and a toggle.
Muting never deletes; it creates a signed Decision with TTL/reason and is reversible.
### 3.5 Deterministic and replayable
Users must be able to export an evidence bundle containing:
- scan replay manifest (feeds/rules/policies/hashes)
- signed artifacts
- outputs (risk result, snapshots)
so auditors can replay identically.
## 4. Information Architecture
### 4.1 Screens
1) Findings Table (global)
- Purpose: scan, sort, filter, jump into cases
- Default: muted lanes hidden
- Banner: shows count of auto-muted by policy with "Show" toggle
2) Case View (single-page narrative)
- Purpose: decision making + proof review
- Above fold: verdict + chips + deterministic score
- Right rail: evidence list
- Tabs (max 3):
- Evidence (default)
- Reachability & Impact
- History (Smart-Diff)
3) Export / Verify Bundle
- Purpose: offline/audit verification
- Async export job, then download DSSE-signed zip
- Verification UI: signature status, hash tree, issuer chain
### 4.2 Lanes (visibility buckets)
Lanes are a UX categorization derived from deterministic risk + decisions:
- ACTIVE
- BLOCKED
- NEEDS_EXCEPTION
- MUTED_REACH (non-reachable)
- MUTED_VEX (effective VEX says not_affected)
- COMPENSATED (controls satisfy policy)
Default: show ACTIVE/BLOCKED/NEEDS_EXCEPTION.
Muted lanes appear behind a toggle and via the banner counts.
## 5. Case View Layout (Required)
### 5.1 Top Bar
- Asset name / Image tag / Environment
- Last evaluated time
- Policy profile name (e.g., "Strict CI Gate")
### 5.2 Verdict Banner (Above fold)
Large, unambiguous verdict:
- SHIP
- BLOCKED
- NEEDS EXCEPTION
Below verdict:
- One-line "why" summary (max 140 chars), e.g.:
- "Reachable path observed; exploit signal present; Policy 'prod-strict' blocks."
### 5.3 Chips (Each chip is clickable)
Minimum set:
- Reachability: Reachable / Not reachable / Unknown (with confidence)
- Effective VEX: affected / not_affected / under_investigation
- Exploit signal: yes/no + source indicator
- Exposure: internet-exposed yes/no (if available)
- Asset tier: tier label
- Gate: allow/block/exception-needed (policy gate name)
Chip click behavior:
- Opens evidence panel anchored to the proof objects
- Shows source chain (concelier/excititor preserved sources)
### 5.4 Evidence Rail (Always visible right side)
List of evidence artifacts with:
- Type icon
- Title
- Issuer
- Signed/verified indicator
- Content hash (short)
- Created timestamp
Actions per item:
- Preview
- Copy hash
- Open raw
- "Show in bundle" marker
### 5.5 Actions Footer (Only primary actions)
- Create work item
- Acknowledge / Mute (opens Decision drawer)
- Propose exception (Decision with TTL + approver chain)
- Export evidence bundle
No more than 4 primary buttons. Secondary actions go into kebab menu.
## 6. Decision Flows (Mute/Ack/Exception)
### 6.1 Decision Drawer (common UI)
Fields:
- Decision kind: Mute reach / Mute VEX / Acknowledge / Exception
- Reason code (dropdown) + free-text note
- TTL (required for exceptions; optional for mutes)
- Policy ref (auto-filled; editable only by admins)
- "Sign and apply" (server-side DSSE signing; user identity included)
On submit:
- Create Decision (signed)
- Re-evaluate lane/verdict if applicable
- Create Snapshot ("DECISION" trigger)
- Show toast with undo link
### 6.2 Undo
Undo is implemented as "revoke decision" (signed revoke record or revocation fields).
Never delete.
## 7. Smart-Diff UX
### 7.1 Timeline
Chronological snapshots:
- when (timestamp)
- trigger (feed/vex/sbom/policy/runtime/decision/rescan)
- summary (short)
### 7.2 Diff panel
Two-column diff:
- Inputs changed (with proof links): VEX assertion changed, policy version changed, runtime trace arrived, etc.
- Outputs changed: lane, verdict, score, gates
### 7.3 Meaningful change definition
The UI only highlights "meaningful" changes:
- verdict change
- lane change
- score crosses a policy threshold
- reachability state changes
- effective VEX status changes
Other changes remain in "details" expandable.
## 8. Performance & UI Engineering Requirements
- Findings table uses virtual scroll and server-side pagination.
- Case view loads in 2 steps:
1) Header narrative (small payload)
2) Evidence list + snapshots (lazy)
- Evidence previews are lazy-loaded and cancellable.
- Use ETag/If-None-Match for case and evidence list endpoints.
- UI must remain usable under high latency (air-gapped / offline kits):
- show cached last-known verdict with clear "stale" marker
- allow exporting bundles from cached artifacts when permissible
## 9. Accessibility & Operator Usability
- Keyboard navigation: table rows, chips, evidence list
- High contrast mode supported
- All status is conveyed by text + shape (not color only)
- Copy-to-clipboard for hashes, purls, CVE IDs
## 10. Telemetry (Must instrument)
- TTFS: notification click → verdict banner rendered
- Time-to-proof: click chip → proof preview shown
- Mute reversal rate (auto-muted later becomes actionable)
- Bundle export success/latency
## 11. Responsibilities by Service
- `scanner.webservice`:
- produces reachability results, risk results, snapshots
- stores/serves case narrative header, evidence indexes, Smart-Diff
- `concelier`:
- aggregates vuln feeds and preserves per-source provenance ("preserve prune source")
- `excititor`:
- merges VEX and preserves original assertion sources ("preserve prune source")
- `notify.webservice`:
- emits first_signal / risk_changed / gate_blocked
- `scheduler.webservice`:
- re-evaluates existing images on feed/policy updates, triggers snapshots
---
**Document Version**: 1.0
**Target Platform**: .NET 10, PostgreSQL >= 16, Angular v17