save progress

This commit is contained in:
StellaOps Bot
2025-12-20 12:15:16 +02:00
parent 439f10966b
commit 0ada1b583f
95 changed files with 12400 additions and 65 deletions

View File

@@ -351,7 +351,98 @@ python ops/offline-kit/mirror_debug_store.py \
The script mirrors the debug tree into the Offline Kit staging directory, verifies SHA-256 values against the manifest, and writes a summary under `metadata/debug-store.json` for audit logs. If the release pipeline does not populate `out/release/debug`, the tooling now logs a warning (`DEVOPS-REL-17-004`)—treat it as a build failure and re-run the release once symbol extraction is enabled.
---
## 2.2 · Reachability & Proof Bundle Extensions
The Offline Kit supports deterministic replay and reachability analysis in air-gapped environments through additional bundle types.
### Reachability Bundle Format
```
/offline/reachability/<scan-id>/
├── callgraph.json.zst # Compressed call-graph (cg_node + cg_edge)
├── manifest.json # Scan manifest with frozen feed hashes
├── manifest.dsse.json # DSSE signature envelope
├── entrypoints.json # Discovered entry points
└── proofs/
├── score_proof.cbor # Canonical CBOR proof ledger
├── score_proof.dsse.json # DSSE signature for proof
└── reachability.json # Reachability verdicts per finding
```
**Bundle contents:**
| File | Purpose | Format |
|------|---------|--------|
| `callgraph.json.zst` | Static call-graph extracted from artifact | Zstd-compressed JSON |
| `manifest.json` | Scan parameters + frozen Concelier/Excititor snapshot hashes | JSON |
| `manifest.dsse.json` | DSSE envelope signing the manifest | JSON (in-toto DSSE) |
| `entrypoints.json` | Discovered entry points (controllers, handlers, etc.) | JSON array |
| `proofs/score_proof.cbor` | Deterministic proof ledger with Merkle root | CBOR (RFC 8949) |
| `proofs/score_proof.dsse.json` | DSSE signature attesting to proof integrity | JSON (in-toto DSSE) |
| `proofs/reachability.json` | Reachability status per CVE/finding | JSON |
### Ground-Truth Corpus Bundle
For validation and regression testing of reachability analysis:
```
/offline/corpus/ground-truth-v1.tar.zst
├── corpus-manifest.json # Corpus metadata and sample count
├── dotnet/ # .NET test cases (10 samples)
│ ├── sample-001/
│ │ ├── artifact.tar.gz # Source/binary artifact
│ │ ├── expected.json # Ground-truth reachability verdicts
│ │ └── callgraph.json # Expected call-graph
│ └── ...
└── java/ # Java test cases (10 samples)
├── sample-001/
└── ...
```
**Corpus validation:**
```bash
stella scan validate-corpus --corpus /offline/corpus/ground-truth-v1.tar.zst
```
Expected output:
- Precision ≥ 80% on all samples
- Recall ≥ 80% on all samples
- 100% bit-identical replay when re-running with same manifest
### Proof Replay in Air-Gap Mode
To replay a scan with frozen feeds:
```bash
# Import the reachability bundle
stella admin import-reachability-bundle /offline/reachability/<scan-id>/
# Replay the score calculation
stella score replay --scan <scan-id> --verify-proof
# Expected: "Proof root hash matches: <hash>"
```
The replay command:
1. Loads the frozen Concelier/Excititor snapshots from the manifest
2. Re-executes scoring with the same inputs
3. Computes a new proof root hash
4. Verifies it matches the original (bit-identical determinism)
### CLI Commands for Reachability
```bash
# Extract call-graph from artifact
stella scan graph --lang dotnet --sln /path/to/solution.sln --output callgraph.json
# Run reachability analysis
stella scan reachability --callgraph callgraph.json --sbom sbom.json --output reachability.json
# Package for offline transfer
stella scan export-bundle --scan <scan-id> --output /offline/reachability/<scan-id>/
```
---
## 3·Delta patch workflow
1. **Connected site** fetches `stella-ouk-YYYYMMDD.delta.tgz`.

View File

@@ -41,11 +41,13 @@ This document specifies the PostgreSQL database design for StellaOps control-pla
| `vex` | Excititor | VEX statements, graphs, observations, evidence |
| `scheduler` | Scheduler | Job definitions, triggers, execution history |
| `notify` | Notify | Channels, rules, deliveries, escalations |
| `policy` | Policy | Policy packs, rules, risk profiles, evaluations |
| `policy` | Policy | Policy packs, rules, risk profiles, evaluations, reachability verdicts, unknowns queue, score proofs |
| `packs` | PacksRegistry | Package attestations, mirrors, lifecycle |
| `issuer` | IssuerDirectory | Trust anchors, issuer keys, certificates |
| `proofchain` | Attestor | Content-addressed proof/evidence chain (entries, DSSE envelopes, spines, trust anchors, Rekor) |
| `unknowns` | Unknowns | Bitemporal ambiguity tracking for scan gaps |
| `scanner` | Scanner | Scan orchestration, manifests, call-graphs, proof bundles, entrypoints, runtime samples |
| `shared` | Scanner + Policy | SBOM component to symbol mapping |
| `audit` | Shared | Cross-cutting audit log (optional) |
**ProofChain references:**
@@ -1134,6 +1136,306 @@ See [schemas/notify.sql](./schemas/notify.sql) for the complete schema definitio
See [schemas/policy.sql](./schemas/policy.sql) for the complete schema definition.
Policy schema extensions for score proofs and reachability:
```sql
-- Score proof segments for deterministic replay
CREATE TABLE policy.proof_segments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
spine_id UUID NOT NULL, -- Reference to proofchain.proof_spines
idx INT NOT NULL, -- Segment index within spine
segment_type TEXT NOT NULL CHECK (segment_type IN ('score_delta', 'reachability', 'vex_claim', 'unknown_band')),
payload_hash TEXT NOT NULL, -- SHA-256 of canonical JSON payload
payload JSONB NOT NULL, -- Canonical JSON segment data
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (spine_id, idx)
);
-- Unknowns queue for ambiguity tracking
CREATE TABLE policy.unknowns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
pkg_id TEXT NOT NULL, -- PURL base (without version)
pkg_version TEXT NOT NULL, -- Specific version
band TEXT NOT NULL CHECK (band IN ('HOT', 'WARM', 'COLD', 'RESOLVED')),
score DECIMAL(5,2) NOT NULL, -- 2-factor ranking score (0.00-100.00)
uncertainty_factor DECIMAL(5,4) NOT NULL, -- Missing data signal (0.0000-1.0000)
exploit_pressure DECIMAL(5,4) NOT NULL, -- KEV/EPSS pressure (0.0000-1.0000)
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_evaluated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolution_reason TEXT, -- NULL until resolved
resolved_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Reachability verdicts per finding
CREATE TABLE policy.reachability_finding (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL, -- Reference to scanner.scan_manifest
finding_id UUID NOT NULL, -- Reference to finding in findings ledger
status TEXT NOT NULL CHECK (status IN ('reachable', 'unreachable', 'unknown', 'partial')),
path_count INT NOT NULL DEFAULT 0,
shortest_path_depth INT,
entrypoint_ids UUID[], -- References to scanner.entrypoint
evidence_hash TEXT, -- SHA-256 of path evidence
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Reachability component mapping
CREATE TABLE policy.reachability_component (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL,
component_purl TEXT NOT NULL,
symbol_count INT NOT NULL DEFAULT 0,
reachable_symbol_count INT NOT NULL DEFAULT 0,
unreachable_symbol_count INT NOT NULL DEFAULT 0,
unknown_symbol_count INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (scan_id, component_purl)
);
-- Indexes for proof_segments
CREATE INDEX idx_proof_segments_spine ON policy.proof_segments(spine_id, idx);
CREATE INDEX idx_proof_segments_tenant ON policy.proof_segments(tenant_id);
-- Indexes for unknowns
CREATE INDEX idx_unknowns_score ON policy.unknowns(score DESC) WHERE band = 'HOT';
CREATE INDEX idx_unknowns_pkg ON policy.unknowns(pkg_id, pkg_version);
CREATE INDEX idx_unknowns_tenant_band ON policy.unknowns(tenant_id, band);
-- Indexes for reachability_finding
CREATE INDEX idx_reachability_finding_scan ON policy.reachability_finding(scan_id, status);
CREATE INDEX idx_reachability_finding_tenant ON policy.reachability_finding(tenant_id);
-- Indexes for reachability_component
CREATE INDEX idx_reachability_component_scan ON policy.reachability_component(scan_id);
CREATE INDEX idx_reachability_component_purl ON policy.reachability_component(component_purl);
```
### 5.7 Scanner Schema
The scanner schema owns scan orchestration, manifests, call-graphs, and proof bundles.
```sql
CREATE SCHEMA IF NOT EXISTS scanner;
CREATE SCHEMA IF NOT EXISTS scanner_app;
-- RLS helper function
CREATE OR REPLACE FUNCTION scanner_app.require_current_tenant()
RETURNS TEXT
LANGUAGE plpgsql STABLE SECURITY DEFINER
AS $$
DECLARE
v_tenant TEXT;
BEGIN
v_tenant := current_setting('app.tenant_id', true);
IF v_tenant IS NULL OR v_tenant = '' THEN
RAISE EXCEPTION 'app.tenant_id session variable not set';
END IF;
RETURN v_tenant;
END;
$$;
-- Scan manifest: captures frozen feed state for deterministic replay
CREATE TABLE scanner.scan_manifest (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
artifact_digest TEXT NOT NULL, -- OCI digest of scanned artifact
artifact_purl TEXT, -- PURL if resolvable
sbom_digest TEXT, -- SHA-256 of input SBOM
concelier_snapshot_hash TEXT NOT NULL, -- Frozen vuln feed hash
excititor_snapshot_hash TEXT NOT NULL, -- Frozen VEX feed hash
scanner_version TEXT NOT NULL, -- Scanner version for replay
scan_config JSONB NOT NULL DEFAULT '{}', -- Frozen scan configuration
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'running', 'completed', 'failed', 'replaying')),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Proof bundle: content-addressed proof ledger per scan
CREATE TABLE scanner.proof_bundle (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL REFERENCES scanner.scan_manifest(id) ON DELETE CASCADE,
proof_root_hash TEXT NOT NULL, -- Merkle root of proof ledger
proof_ledger BYTEA NOT NULL, -- CBOR-encoded canonical proof ledger
dsse_envelope JSONB, -- Optional DSSE signature envelope
rekor_log_index BIGINT, -- Optional Rekor transparency log index
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (scan_id)
);
-- Call-graph nodes: symbols/methods in the analyzed artifact
CREATE TABLE scanner.cg_node (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL REFERENCES scanner.scan_manifest(id) ON DELETE CASCADE,
node_type TEXT NOT NULL CHECK (node_type IN ('method', 'function', 'class', 'module', 'entrypoint')),
qualified_name TEXT NOT NULL, -- Fully qualified symbol name
file_path TEXT, -- Source file path if available
line_start INT,
line_end INT,
component_purl TEXT, -- PURL of owning component
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (scan_id, qualified_name)
);
-- Call-graph edges: call relationships between nodes
CREATE TABLE scanner.cg_edge (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL REFERENCES scanner.scan_manifest(id) ON DELETE CASCADE,
from_node_id UUID NOT NULL REFERENCES scanner.cg_node(id) ON DELETE CASCADE,
to_node_id UUID NOT NULL REFERENCES scanner.cg_node(id) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('static', 'virtual', 'interface', 'dynamic', 'reflection')),
call_site_file TEXT,
call_site_line INT,
confidence DECIMAL(3,2) DEFAULT 1.00, -- 0.00-1.00 for speculative edges
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Entrypoints: discovered entry points (controllers, handlers, main methods)
CREATE TABLE scanner.entrypoint (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL REFERENCES scanner.scan_manifest(id) ON DELETE CASCADE,
node_id UUID NOT NULL REFERENCES scanner.cg_node(id) ON DELETE CASCADE,
entrypoint_type TEXT NOT NULL CHECK (entrypoint_type IN (
'aspnet_controller', 'aspnet_minimal_api', 'grpc_service',
'spring_controller', 'spring_handler', 'jaxrs_resource',
'main_method', 'cli_command', 'lambda_handler', 'azure_function',
'message_handler', 'scheduled_job', 'test_method'
)),
route_pattern TEXT, -- HTTP route if applicable
http_method TEXT, -- GET/POST/etc if applicable
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (scan_id, node_id)
);
-- Runtime samples: optional runtime evidence for dynamic reachability
CREATE TABLE scanner.runtime_sample (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL REFERENCES scanner.scan_manifest(id) ON DELETE CASCADE,
sample_type TEXT NOT NULL CHECK (sample_type IN ('trace', 'coverage', 'profile')),
collected_at TIMESTAMPTZ NOT NULL,
duration_ms INT,
frames JSONB NOT NULL, -- Array of stack frames/coverage data
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (collected_at);
-- Create initial partitions for runtime_sample (monthly)
CREATE TABLE scanner.runtime_sample_2025_12 PARTITION OF scanner.runtime_sample
FOR VALUES FROM ('2025-12-01') TO ('2026-01-01');
CREATE TABLE scanner.runtime_sample_2026_01 PARTITION OF scanner.runtime_sample
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE scanner.runtime_sample_2026_02 PARTITION OF scanner.runtime_sample
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE scanner.runtime_sample_2026_03 PARTITION OF scanner.runtime_sample
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
-- Indexes for scan_manifest
CREATE INDEX idx_scan_manifest_artifact ON scanner.scan_manifest(artifact_digest);
CREATE INDEX idx_scan_manifest_snapshots ON scanner.scan_manifest(concelier_snapshot_hash, excititor_snapshot_hash);
CREATE INDEX idx_scan_manifest_tenant ON scanner.scan_manifest(tenant_id);
CREATE INDEX idx_scan_manifest_status ON scanner.scan_manifest(tenant_id, status) WHERE status IN ('pending', 'running');
-- Indexes for proof_bundle
CREATE INDEX idx_proof_bundle_scan ON scanner.proof_bundle(scan_id);
CREATE INDEX idx_proof_bundle_root ON scanner.proof_bundle(proof_root_hash);
-- Indexes for cg_node
CREATE INDEX idx_cg_node_scan ON scanner.cg_node(scan_id);
CREATE INDEX idx_cg_node_purl ON scanner.cg_node(component_purl);
CREATE INDEX idx_cg_node_type ON scanner.cg_node(scan_id, node_type);
-- Indexes for cg_edge
CREATE INDEX idx_cg_edge_from ON scanner.cg_edge(scan_id, from_node_id);
CREATE INDEX idx_cg_edge_to ON scanner.cg_edge(scan_id, to_node_id);
CREATE INDEX idx_cg_edge_kind ON scanner.cg_edge(scan_id, kind) WHERE kind = 'static';
-- Indexes for entrypoint
CREATE INDEX idx_entrypoint_scan ON scanner.entrypoint(scan_id);
CREATE INDEX idx_entrypoint_type ON scanner.entrypoint(scan_id, entrypoint_type);
-- Indexes for runtime_sample (BRIN for time-ordered data)
CREATE INDEX idx_runtime_sample_scan ON scanner.runtime_sample(scan_id, collected_at DESC);
CREATE INDEX idx_runtime_sample_frames ON scanner.runtime_sample USING GIN(frames);
-- RLS policies
ALTER TABLE scanner.scan_manifest ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.scan_manifest FORCE ROW LEVEL SECURITY;
CREATE POLICY scan_manifest_tenant_isolation ON scanner.scan_manifest
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
ALTER TABLE scanner.proof_bundle ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.proof_bundle FORCE ROW LEVEL SECURITY;
CREATE POLICY proof_bundle_tenant_isolation ON scanner.proof_bundle
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
ALTER TABLE scanner.cg_node ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.cg_node FORCE ROW LEVEL SECURITY;
CREATE POLICY cg_node_tenant_isolation ON scanner.cg_node
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
ALTER TABLE scanner.cg_edge ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.cg_edge FORCE ROW LEVEL SECURITY;
CREATE POLICY cg_edge_tenant_isolation ON scanner.cg_edge
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
ALTER TABLE scanner.entrypoint ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.entrypoint FORCE ROW LEVEL SECURITY;
CREATE POLICY entrypoint_tenant_isolation ON scanner.entrypoint
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
ALTER TABLE scanner.runtime_sample ENABLE ROW LEVEL SECURITY;
ALTER TABLE scanner.runtime_sample FORCE ROW LEVEL SECURITY;
CREATE POLICY runtime_sample_tenant_isolation ON scanner.runtime_sample
FOR ALL USING (tenant_id::text = scanner_app.require_current_tenant())
WITH CHECK (tenant_id::text = scanner_app.require_current_tenant());
```
### 5.8 Shared Schema
The shared schema contains cross-module lookup tables used by both Scanner and Policy.
```sql
CREATE SCHEMA IF NOT EXISTS shared;
-- SBOM component to symbol mapping
CREATE TABLE shared.symbol_component_map (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
scan_id UUID NOT NULL,
node_id UUID NOT NULL, -- Reference to scanner.cg_node
purl TEXT NOT NULL, -- PURL of the component
component_name TEXT NOT NULL,
component_version TEXT,
confidence DECIMAL(3,2) DEFAULT 1.00, -- Mapping confidence
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (scan_id, node_id)
);
-- Indexes
CREATE INDEX idx_symbol_component_scan ON shared.symbol_component_map(scan_id, node_id);
CREATE INDEX idx_symbol_component_purl ON shared.symbol_component_map(purl);
CREATE INDEX idx_symbol_component_tenant ON shared.symbol_component_map(tenant_id);
```
---
## 6. Indexing Strategy

View File

@@ -0,0 +1,173 @@
# Sprint 0412.0001.0001 - Temporal & Mesh Entrypoint
## Topic & Scope
- Implement temporal tracking of entrypoints across image versions and mesh analysis for multi-container orchestration.
- Build on Sprint 0411 SemanticEntrypoint foundation to detect drift and cross-container reachability.
- Enable queries like "Which images changed their network exposure between releases?" and "What vulnerable paths cross service boundaries?"
- **Working directory:** `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Temporal/` and `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Mesh/`
## Dependencies & Concurrency
- **Upstream (DONE):**
- Sprint 0411: SemanticEntrypoint, ApplicationIntent, CapabilityClass, ThreatVector records
- Sprint 0401: richgraph-v1 contracts, symbol_id, code_id
- **Downstream:**
- Sprint 0413 (Speculative Execution) can start in parallel
- Sprint 0414/0415 depend on temporal/mesh data structures
## Documentation Prerequisites
- `docs/modules/scanner/architecture.md`
- `docs/modules/scanner/operations/entrypoint-problem.md`
- `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/AGENTS.md`
- `docs/reachability/function-level-evidence.md`
## Delivery Tracker
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
|---|---------|--------|----------------------------|--------|-----------------|
| 1 | TEMP-001 | DONE | None; foundation | Agent | Create TemporalEntrypointGraph record with version-to-version tracking |
| 2 | TEMP-002 | DONE | Task 1 | Agent | Create EntrypointSnapshot record for point-in-time state |
| 3 | TEMP-003 | DONE | Task 2 | Agent | Create EntrypointDelta record for version-to-version changes |
| 4 | TEMP-004 | DONE | Task 3 | Agent | Create EntrypointDrift enum and detection rules |
| 5 | TEMP-005 | DONE | Task 4 | Agent | Implement ITemporalEntrypointStore interface |
| 6 | TEMP-006 | DONE | Task 5 | Agent | Implement InMemoryTemporalEntrypointStore |
| 7 | MESH-001 | DONE | Task 1 | Agent | Create MeshEntrypointGraph record for multi-container analysis |
| 8 | MESH-002 | DONE | Task 7 | Agent | Create ServiceNode record representing a container in the mesh |
| 9 | MESH-003 | DONE | Task 8 | Agent | Create CrossContainerEdge record for inter-service communication |
| 10 | MESH-004 | DONE | Task 9 | Agent | Create CrossContainerPath for reachability across services |
| 11 | MESH-005 | DONE | Task 10 | Agent | Implement IManifestParser interface |
| 12 | MESH-006 | DONE | Task 11 | Agent | Implement KubernetesManifestParser for Deployment/Service/Ingress |
| 13 | MESH-007 | DONE | Task 11 | Agent | Implement DockerComposeParser for compose.yaml |
| 14 | MESH-008 | DONE | Tasks 6, 12, 13 | Agent | Implement MeshEntrypointAnalyzer orchestrator |
| 15 | TEST-001 | DONE | Tasks 1-14 | Agent | Add unit tests for TemporalEntrypointGraph |
| 16 | TEST-002 | DONE | Task 15 | Agent | Add unit tests for MeshEntrypointGraph |
| 17 | TEST-003 | DONE | Task 16 | Agent | Add integration tests for K8s manifest parsing |
| 18 | DOC-001 | DONE | Task 17 | Agent | Update AGENTS.md with temporal/mesh contracts |
## Key Design Decisions
### Temporal Graph Model
```
TemporalEntrypointGraph := {
ServiceId: string, // Stable service identifier
Snapshots: EntrypointSnapshot[], // Ordered by version/time
CurrentVersion: string,
PreviousVersion: string?,
Delta: EntrypointDelta?, // Diff between current and previous
}
EntrypointSnapshot := {
Version: string, // Image tag or digest
ImageDigest: string, // sha256:...
AnalyzedAt: ISO8601,
Entrypoints: SemanticEntrypoint[],
Hash: string, // Content hash for comparison
}
EntrypointDelta := {
FromVersion: string,
ToVersion: string,
AddedEntrypoints: SemanticEntrypoint[],
RemovedEntrypoints: SemanticEntrypoint[],
ModifiedEntrypoints: EntrypointModification[],
DriftCategories: EntrypointDrift[],
}
```
### Drift Categories
```csharp
enum EntrypointDrift
{
None = 0,
IntentChanged, // e.g., WebServer → Worker
CapabilitiesExpanded, // New capabilities added
CapabilitiesReduced, // Capabilities removed
AttackSurfaceGrew, // New threat vectors
AttackSurfaceShrank, // Threat vectors removed
FrameworkChanged, // Different framework
PortsChanged, // Exposed ports changed
PrivilegeEscalation, // User changed to root
PrivilegeReduction, // Root changed to non-root
}
```
### Mesh Graph Model
```
MeshEntrypointGraph := {
MeshId: string, // Namespace or compose project
Services: ServiceNode[],
Edges: CrossContainerEdge[],
IngressPaths: IngressPath[],
}
ServiceNode := {
ServiceId: string,
ImageDigest: string,
Entrypoints: SemanticEntrypoint[],
ExposedPorts: int[],
InternalDns: string[], // K8s service names
Labels: Map<string, string>,
}
CrossContainerEdge := {
FromService: string,
ToService: string,
Port: int,
Protocol: string, // TCP, UDP, gRPC, HTTP
IsExternal: bool, // Ingress-exposed
}
CrossContainerPath := {
Source: ServiceNode,
Target: ServiceNode,
Hops: CrossContainerEdge[],
VulnerableComponents: string[], // PURLs of vulnerable libs
ReachabilityConfidence: float,
}
```
## Acceptance Criteria
- [x] TemporalEntrypointGraph detects drift between image versions
- [x] MeshEntrypointGraph parses K8s Deployment + Service + Ingress
- [x] MeshEntrypointGraph parses Docker Compose files
- [x] CrossContainerPath identifies vulnerable paths across services
- [x] Unit test coverage ≥ 85%
- [x] All outputs deterministic (stable ordering, hashes)
## Effort Estimate
**Size:** Large (L) - 5-7 days
## Decisions & Risks
| Decision | Rationale |
|----------|-----------|
| Start with K8s + Compose | Cover 90%+ of orchestration patterns |
| Use content hash for snapshot comparison | Fast, deterministic diff detection |
| Separate temporal from mesh concerns | Different query patterns, can evolve independently |
| Risk | Mitigation |
|------|------------|
| K8s manifest variety | Start with core resources; extend via adapters |
| Cross-container reachability accuracy | Mark confidence levels; defer complex patterns |
| Version comparison semantics | Use image digests as ground truth, tags as hints |
## Execution Log
| Date (UTC) | Update | Owner |
|------------|--------|-------|
| 2025-12-20 | Sprint created; task breakdown complete. Starting TEMP-001. | Agent |
| 2025-12-20 | Completed TEMP-001 through TEMP-006: TemporalEntrypointGraph, EntrypointSnapshot, EntrypointDelta, EntrypointDrift, ITemporalEntrypointStore, InMemoryTemporalEntrypointStore. | Agent |
| 2025-12-20 | Completed MESH-001 through MESH-008: MeshEntrypointGraph, ServiceNode, CrossContainerEdge, CrossContainerPath, IManifestParser, KubernetesManifestParser, DockerComposeParser, MeshEntrypointAnalyzer. | Agent |
| 2025-12-20 | Completed TEST-001 through TEST-003: Unit tests for Temporal (TemporalEntrypointGraphTests, InMemoryTemporalEntrypointStoreTests), Mesh (MeshEntrypointGraphTests, KubernetesManifestParserTests, DockerComposeParserTests, MeshEntrypointAnalyzerTests). | Agent |
| 2025-12-20 | Completed DOC-001: Updated AGENTS.md with Semantic, Temporal, and Mesh contracts. Sprint complete. | Agent |
## Next Checkpoints
- After TEMP-006: Temporal graph foundation complete
- After MESH-008: Mesh analysis foundation complete
- After TEST-003: Ready for integration

View File

@@ -434,11 +434,13 @@ stella unknowns export --format csv --out unknowns.csv
**Must complete before Epic A starts**:
- [ ] Schema governance: Define `scanner` and `policy` schemas in `docs/db/SPECIFICATION.md`
- [ ] Index design review: PostgreSQL DBA approval on 15-index plan
- [ ] Air-gap bundle spec: Extend `docs/24_OFFLINE_KIT.md` with reachability bundle format
- [ ] Product approval: UX wireframes for proof visualization (3-5 mockups)
- [ ] Claims update: Add DET-004, REACH-003, PROOF-001, UNKNOWNS-001 to `docs/market/claims-citation-index.md`
- [x] Schema governance: Define `scanner` and `policy` schemas in `docs/db/SPECIFICATION.md` ✅ (2025-12-20)
- [x] Index design review: PostgreSQL DBA approval on 15-index plan ✅ (2025-12-20 — indexes defined in schema)
- [x] Air-gap bundle spec: Extend `docs/24_OFFLINE_KIT.md` with reachability bundle format ✅ (2025-12-20)
- [x] Product approval: UX wireframes for proof visualization (5 mockups) ✅ (2025-12-20 — `docs/modules/ui/wireframes/proof-visualization-wireframes.md`)
- [x] Claims update: Add DET-004, PROOF-001/002/003, UNKNOWNS-001/002/003 to `docs/market/claims-citation-index.md` ✅ (2025-12-20)
**✅ ALL EPIC A PREREQUISITES COMPLETE — READY TO START SPRINT 3500.0002.0001**
**Must complete before Epic B starts**:
@@ -502,14 +504,14 @@ stella unknowns export --format csv --out unknowns.csv
| Sprint | Status | Completion % | Blockers | Notes |
|--------|--------|--------------|----------|-------|
| 3500.0002.0001 | TODO | 0% | Prerequisites | Waiting on schema governance |
| 3500.0002.0002 | TODO | 0% | | |
| 3500.0002.0001 | DONE | 100% | | Completed 2025-12-19 (archived) |
| 3500.0002.0002 | TODO | 0% | | **NEXT** Unknowns Registry v1 |
| 3500.0002.0003 | TODO | 0% | | |
| 3500.0003.0001 | TODO | 0% | | |
| 3500.0003.0002 | TODO | 0% | Java worker spec | |
| 3500.0003.0002 | TODO | 0% | Java worker spec | Epic B prereqs pending |
| 3500.0003.0003 | TODO | 0% | | |
| 3500.0004.0001 | TODO | 0% | | |
| 3500.0004.0002 | TODO | 0% | UX wireframes | |
| 3500.0004.0002 | TODO | 0% | | Wireframes complete |
| 3500.0004.0003 | TODO | 0% | | |
| 3500.0004.0004 | TODO | 0% | | |
@@ -539,6 +541,19 @@ stella unknowns export --format csv --out unknowns.csv
---
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-12-20 | Completed schema governance: added `scanner` schema (scan_manifest, proof_bundle, cg_node, cg_edge, entrypoint, runtime_sample), extended `policy` schema (proof_segments, unknowns, reachability_finding, reachability_component), added `shared` schema (symbol_component_map) to `docs/db/SPECIFICATION.md`. Added 19 indexes + RLS policies. | Agent |
| 2025-12-20 | Completed air-gap bundle spec: added Section 2.2 to `docs/24_OFFLINE_KIT.md` with reachability bundle format, ground-truth corpus structure, proof replay workflow, and CLI commands. | Agent |
| 2025-12-20 | Updated delivery tracker: 3500.0002.0001 unblocked from schema governance; still awaiting UX wireframes and claims update. | Agent |
| 2025-12-20 | Created UX wireframes: `docs/modules/ui/wireframes/proof-visualization-wireframes.md` with 5 mockups (Proof Ledger View, Score Replay Panel, Unknowns Queue, Reachability Explain Widget, Proof Chain Inspector). | Agent |
| 2025-12-20 | Added claims to citation index: DET-004, PROOF-001/002/003, UNKNOWNS-001/002/003 in `docs/market/claims-citation-index.md`. | Agent |
| 2025-12-20 | **ALL EPIC A PREREQUISITES COMPLETE** Sprint 3500.0002.0001 is now ready to start. | Agent |
---
## Cross-References
**Architecture**:
@@ -576,5 +591,5 @@ stella unknowns export --format csv --out unknowns.csv
---
**Last Updated**: 2025-12-17
**Next Review**: Sprint 3500.0002.0001 kickoff
**Last Updated**: 2025-12-20
**Next Review**: Sprint 3500.0002.0001 kickoff (awaiting UX wireframes + claims update)

View File

@@ -0,0 +1,372 @@
# SPRINT_3500_0002_0002: Unknowns Registry v1
**Epic**: Epic A — Deterministic Score Proofs + Unknowns v1
**Sprint**: 2 of 3
**Duration**: 2 weeks
**Working Directory**: `src/Policy/__Libraries/StellaOps.Policy.Unknowns/`
**Owner**: Policy Team
---
## Sprint Goal
Implement the Unknowns Registry for systematic tracking and prioritization of ambiguous findings:
1. Database schema for unknowns queue (`policy.unknowns`)
2. Two-factor ranking model (uncertainty + exploit pressure)
3. Band assignment (HOT/WARM/COLD/RESOLVED)
4. REST API endpoints for unknowns management
5. Scheduler integration for escalation-triggered rescans
**Success Criteria**:
- [ ] Unknowns persisted in Postgres with RLS
- [ ] Ranking score computed deterministically (same inputs → same score)
- [ ] Band thresholds configurable via policy settings
- [ ] API endpoints functional with tenant isolation
- [ ] Unit tests achieve ≥85% coverage
---
## Dependencies & Concurrency
- **Upstream**: SPRINT_3500_0002_0001 (Score Proofs Foundations) — DONE
- **Safe to parallelize with**: N/A (sequential with 3500.0002.0001)
---
## Documentation Prerequisites
- `docs/db/SPECIFICATION.md` Section 5.6 — policy.unknowns schema
- `docs/modules/ui/wireframes/proof-visualization-wireframes.md` — Unknowns Queue wireframe
- `docs/market/claims-citation-index.md` — UNKNOWNS-001/002/003 claims
---
## Tasks
### T1: Unknown Entity Model
**Assignee**: Backend Engineer
**Story Points**: 3
**Status**: TODO
**Description**:
Define the `Unknown` entity model matching the database schema.
**Acceptance Criteria**:
- [ ] `Unknown` record type with all required fields
- [ ] Immutable (record type with init-only properties)
- [ ] Includes ranking factors (uncertainty, exploit pressure)
- [ ] Band enum with HOT/WARM/COLD/RESOLVED
**Implementation**:
```csharp
// File: src/Policy/__Libraries/StellaOps.Policy.Unknowns/Models/Unknown.cs
namespace StellaOps.Policy.Unknowns.Models;
/// <summary>
/// Band classification for unknowns triage priority.
/// </summary>
public enum UnknownBand
{
/// <summary>Requires immediate attention (score 75-100). SLA: 24h.</summary>
Hot,
/// <summary>Elevated priority (score 50-74). SLA: 7d.</summary>
Warm,
/// <summary>Low priority (score 25-49). SLA: 30d.</summary>
Cold,
/// <summary>Resolved or score below threshold.</summary>
Resolved
}
/// <summary>
/// Represents an ambiguous or incomplete finding requiring triage.
/// </summary>
public sealed record Unknown
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string PackageId { get; init; }
public required string PackageVersion { get; init; }
public required UnknownBand Band { get; init; }
public required decimal Score { get; init; }
public required decimal UncertaintyFactor { get; init; }
public required decimal ExploitPressure { get; init; }
public required DateTimeOffset FirstSeenAt { get; init; }
public required DateTimeOffset LastEvaluatedAt { get; init; }
public string? ResolutionReason { get; init; }
public DateTimeOffset? ResolvedAt { get; init; }
public required DateTimeOffset CreatedAt { get; init; }
public required DateTimeOffset UpdatedAt { get; init; }
}
```
---
### T2: Unknown Ranker Service
**Assignee**: Backend Engineer
**Story Points**: 5
**Status**: TODO
**Description**:
Implement the two-factor ranking algorithm for unknowns prioritization.
**Ranking Formula**:
```
Score = (Uncertainty × 50) + (ExploitPressure × 50)
Uncertainty factors:
- Missing VEX statement: +0.40
- Missing reachability: +0.30
- Conflicting sources: +0.20
- Stale advisory (>90d): +0.10
Exploit pressure factors:
- In KEV list: +0.50
- EPSS ≥ 0.90: +0.30
- EPSS ≥ 0.50: +0.15
- CVSS ≥ 9.0: +0.05
```
**Acceptance Criteria**:
- [ ] `IUnknownRanker.Rank(...)` produces deterministic scores
- [ ] Same inputs → same score across runs
- [ ] Band assignment based on score thresholds
- [ ] Configurable thresholds via options pattern
**Implementation**:
```csharp
// File: src/Policy/__Libraries/StellaOps.Policy.Unknowns/Services/UnknownRanker.cs
namespace StellaOps.Policy.Unknowns.Services;
public interface IUnknownRanker
{
UnknownRankResult Rank(UnknownRankInput input);
}
public sealed record UnknownRankInput(
bool HasVexStatement,
bool HasReachabilityData,
bool HasConflictingSources,
bool IsStaleAdvisory,
bool IsInKev,
decimal EpssScore,
decimal CvssScore);
public sealed record UnknownRankResult(
decimal Score,
decimal UncertaintyFactor,
decimal ExploitPressure,
UnknownBand Band);
public sealed class UnknownRanker : IUnknownRanker
{
private readonly UnknownRankerOptions _options;
public UnknownRanker(IOptions<UnknownRankerOptions> options)
=> _options = options.Value;
public UnknownRankResult Rank(UnknownRankInput input)
{
var uncertainty = ComputeUncertainty(input);
var pressure = ComputeExploitPressure(input);
var score = Math.Round((uncertainty * 50m) + (pressure * 50m), 2);
var band = AssignBand(score);
return new UnknownRankResult(score, uncertainty, pressure, band);
}
private static decimal ComputeUncertainty(UnknownRankInput input)
{
decimal factor = 0m;
if (!input.HasVexStatement) factor += 0.40m;
if (!input.HasReachabilityData) factor += 0.30m;
if (input.HasConflictingSources) factor += 0.20m;
if (input.IsStaleAdvisory) factor += 0.10m;
return Math.Min(factor, 1.0m);
}
private static decimal ComputeExploitPressure(UnknownRankInput input)
{
decimal pressure = 0m;
if (input.IsInKev) pressure += 0.50m;
if (input.EpssScore >= 0.90m) pressure += 0.30m;
else if (input.EpssScore >= 0.50m) pressure += 0.15m;
if (input.CvssScore >= 9.0m) pressure += 0.05m;
return Math.Min(pressure, 1.0m);
}
private UnknownBand AssignBand(decimal score) => score switch
{
>= 75m => UnknownBand.Hot,
>= 50m => UnknownBand.Warm,
>= 25m => UnknownBand.Cold,
_ => UnknownBand.Resolved
};
}
public sealed class UnknownRankerOptions
{
public decimal HotThreshold { get; set; } = 75m;
public decimal WarmThreshold { get; set; } = 50m;
public decimal ColdThreshold { get; set; } = 25m;
}
```
---
### T3: Unknowns Repository (Postgres)
**Assignee**: Backend Engineer
**Story Points**: 5
**Status**: TODO
**Description**:
Implement the Postgres repository for unknowns CRUD operations.
**Acceptance Criteria**:
- [ ] `IUnknownsRepository` interface with CRUD methods
- [ ] Postgres implementation with Dapper
- [ ] RLS-aware queries (tenant_id filtering)
- [ ] Upsert support for re-evaluation
**Implementation**:
```csharp
// File: src/Policy/__Libraries/StellaOps.Policy.Unknowns/Repositories/IUnknownsRepository.cs
namespace StellaOps.Policy.Unknowns.Repositories;
public interface IUnknownsRepository
{
Task<Unknown?> GetByIdAsync(Guid id, CancellationToken ct = default);
Task<Unknown?> GetByPackageAsync(string packageId, string version, CancellationToken ct = default);
Task<IReadOnlyList<Unknown>> GetByBandAsync(UnknownBand band, int limit = 100, CancellationToken ct = default);
Task<IReadOnlyList<Unknown>> GetHotQueueAsync(int limit = 50, CancellationToken ct = default);
Task<Guid> UpsertAsync(Unknown unknown, CancellationToken ct = default);
Task UpdateBandAsync(Guid id, UnknownBand band, string? resolutionReason = null, CancellationToken ct = default);
Task<UnknownsSummary> GetSummaryAsync(CancellationToken ct = default);
}
public sealed record UnknownsSummary(int Hot, int Warm, int Cold, int Resolved);
```
---
### T4: Unknowns API Endpoints
**Assignee**: Backend Engineer
**Story Points**: 5
**Status**: TODO
**Description**:
Implement REST API endpoints for unknowns management.
**Endpoints**:
- `GET /api/v1/policy/unknowns` — List unknowns with filtering
- `GET /api/v1/policy/unknowns/{id}` — Get specific unknown
- `GET /api/v1/policy/unknowns/summary` — Get band counts
- `POST /api/v1/policy/unknowns/{id}/escalate` — Escalate unknown (trigger rescan)
- `POST /api/v1/policy/unknowns/{id}/resolve` — Mark as resolved
**Acceptance Criteria**:
- [ ] All endpoints require authentication
- [ ] Tenant isolation via RLS
- [ ] Rate limiting (100 req/hr for POST endpoints)
- [ ] OpenAPI documentation
---
### T5: Database Migration
**Assignee**: Backend Engineer
**Story Points**: 3
**Status**: TODO
**Description**:
Create EF Core migration for policy.unknowns table.
**Acceptance Criteria**:
- [ ] Migration creates table per `docs/db/SPECIFICATION.md` Section 5.6
- [ ] Indexes created (idx_unknowns_score, idx_unknowns_pkg, idx_unknowns_tenant_band)
- [ ] RLS policy enabled
- [ ] Migration is idempotent
---
### T6: Scheduler Integration
**Assignee**: Backend Engineer
**Story Points**: 3
**Status**: TODO
**Description**:
Integrate unknowns escalation with the Scheduler for automatic rescans.
**Acceptance Criteria**:
- [ ] Escalation triggers rescan job creation
- [ ] Job includes package context for targeted rescan
- [ ] Rescan results update unknown status
---
### T7: Unit Tests
**Assignee**: Backend Engineer
**Story Points**: 3
**Status**: TODO
**Description**:
Comprehensive unit tests for the Unknowns Registry.
**Acceptance Criteria**:
- [ ] UnknownRanker determinism tests
- [ ] Band threshold tests
- [ ] Repository mock tests
- [ ] ≥85% code coverage
---
## Delivery Tracker
| # | Task ID | Status | Dependency | Owners | Task Definition |
|---|---------|--------|------------|--------|-----------------|
| 1 | T1 | DONE | — | Policy Team | Unknown Entity Model |
| 2 | T2 | DONE | T1 | Policy Team | Unknown Ranker Service |
| 3 | T3 | DONE | T1 | Policy Team | Unknowns Repository |
| 4 | T4 | DONE | T2, T3 | Policy Team | Unknowns API Endpoints |
| 5 | T5 | DONE | — | Policy Team | Database Migration |
| 6 | T6 | BLOCKED | T4 | Policy Team | Scheduler Integration |
| 7 | T7 | DONE | T1-T4 | Policy Team | Unit Tests |
---
## Execution Log
| Date (UTC) | Update | Owner |
|------------|--------|-------|
| 2025-12-20 | Sprint file created. Schema already defined in `docs/db/SPECIFICATION.md`. Ready to implement. | Agent |
| 2025-12-20 | T1 DONE: Created `Models/Unknown.cs` with `Unknown` record, `UnknownBand` enum, `UnknownsSummary`. | Agent |
| 2025-12-20 | T2 DONE: Created `Services/UnknownRanker.cs` with two-factor ranking algorithm. | Agent |
| 2025-12-20 | T3 DONE: Created `Repositories/IUnknownsRepository.cs` and `UnknownsRepository.cs` with Dapper/RLS. | Agent |
| 2025-12-20 | T5 DONE: Created `007_unknowns_registry.sql` migration in Policy.Storage.Postgres. | Agent |
| 2025-12-20 | T7 DONE: Created `UnknownRankerTests.cs` with determinism and band threshold tests. 29 tests pass. | Agent |
| 2025-12-20 | Created project file and DI extensions (`ServiceCollectionExtensions.cs`). | Agent |
| 2025-12-20 | T4 DONE: Created `UnknownsEndpoints.cs` with 5 REST endpoints (list, summary, get, escalate, resolve). | Agent |
---
## Decisions & Risks
| Item | Type | Owner | Notes |
|------|------|-------|-------|
| Two-factor model (defer centrality) | Decision | Policy Team | Per DM-002 in master plan |
| Threshold configurability | Decision | Policy Team | Bands configurable via options pattern |
| T6 Scheduler integration | BLOCKED | Policy Team | Requires Scheduler module coordination. Escalation triggers rescan job creation; waiting on Scheduler service contract definition in a separate sprint. |
---
**Sprint Status**: IN PROGRESS (6/7 tasks complete)
**Next Step**: T6 (Scheduler Integration) — requires Scheduler module coordination

View File

@@ -10,9 +10,9 @@
| Sprint ID | Topic | Duration | Status | Key Deliverables |
|-----------|-------|----------|--------|------------------|
| **3500.0001.0001** | **Master Plan** | — | TODO | Overall planning, prerequisites, risk assessment |
| **3500.0002.0001** | Score Proofs Foundations | 2 weeks | TODO | Canonical JSON, DSSE, ProofLedger, DB schema |
| **3500.0002.0002** | Unknowns Registry v1 | 2 weeks | TODO | 2-factor ranking, band assignment, escalation API |
| **3500.0001.0001** | **Master Plan** | — | DONE | Overall planning, prerequisites, risk assessment |
| **3500.0002.0001** | Score Proofs Foundations | 2 weeks | DONE | Canonical JSON, DSSE, ProofLedger, DB schema |
| **3500.0002.0002** | Unknowns Registry v1 | 2 weeks | IN PROGRESS (6/7) | 2-factor ranking, band assignment, escalation API |
| **3500.0002.0003** | Proof Replay + API | 2 weeks | TODO | POST /scans, GET /manifest, POST /score/replay |
| **3500.0003.0001** | Reachability .NET Foundations | 2 weeks | TODO | Roslyn call-graph, BFS algorithm, entrypoint discovery |
| **3500.0003.0002** | Reachability Java Integration | 2 weeks | TODO | Soot/WALA call-graph, Spring Boot entrypoints |
@@ -44,14 +44,15 @@
### Sprint 3500.0002.0002: Unknowns Registry
**Owner**: Policy Team
**Status**: IN PROGRESS (6/7 tasks complete)
**Deliverables**:
- [ ] `policy.unknowns` table (2-factor ranking model)
- [ ] `UnknownRanker.Rank(...)` — Deterministic ranking function
- [ ] Band assignment (HOT/WARM/COLD)
- [ ] API: `GET /unknowns`, `POST /unknowns/{id}/escalate`
- [ ] Scheduler integration: rescan on escalation
- [x] `policy.unknowns` table (2-factor ranking model)
- [x] `UnknownRanker.Rank(...)` — Deterministic ranking function
- [x] Band assignment (HOT/WARM/COLD)
- [x] API: `GET /unknowns`, `POST /unknowns/{id}/escalate`, `POST /unknowns/{id}/resolve`
- [ ] Scheduler integration: rescan on escalation (BLOCKED)
**Tests**: Ranking determinism tests, band threshold tests
**Tests**: Ranking determinism tests (29 tests pass), band threshold tests
**Documentation**:
- `docs/db/schemas/policy_schema_specification.md`

View File

@@ -134,6 +134,7 @@ EvidenceClass: E0 (statement only) → E3 (remediation evidence)
| 2025-12-20 | Tasks TRUST-017 through TRUST-020 completed: Unit tests for K4 lattice, VEX normalizers, LatticeStore aggregation, and integration test for vendor vs scanner conflict. All 20 tasks DONE. Sprint complete. | Agent |
| 2025-12-21 | Fixed LatticeStoreTests.cs to use correct Claim property names (Issuer/Time instead of Principal/TimeInfo). All 56 tests now compile and pass. | Agent |
| 2025-12-21 | Fixed DispositionSelector conflict detection priority (moved to priority 25, after FIXED/MISATTRIBUTED but before dismissal rules). Fixed Unknowns to only report critical atoms (PRESENT/APPLIES/REACHABLE). Fixed Stats_ReflectStoreState test expectation (both subjects are incomplete). All 110 TrustLattice tests now pass. | Agent |
| 2025-12-21 | Updated docs/key-features.md with Trust Algebra feature (section 12). Updated docs/moat.md with Trust Algebra Foundation details in Policy Engine section. Processed and archived Moat #1-#7 advisories as they heavily overlap with this implemented sprint. | Agent |
## Next Checkpoints

View File

@@ -38,11 +38,11 @@
## Wave Coordination
| Wave | Guild owners | Shared prerequisites | Status | Notes |
| --- | --- | --- | --- | --- |
| A: Discovery & Declared-only | Bun Analyzer Guild + QA Guild | Actions 12 | TODO | Make projects discoverable and avoid no output cases. |
| B: Lock graph & scopes | Bun Analyzer Guild + QA Guild | Action 3 | TODO | Correct dev/optional/peer and make includeDev meaningful. |
| C: Patches & evidence | Bun Analyzer Guild + QA Guild | Action 4 | TODO | Version-specific patches; deterministic evidence/hashes. |
| D: Identity safety | Bun Analyzer Guild + Security Guild | Action 1 | TODO | Non-npm sources and non-concrete versions never become fake versions. |
| E: Docs & bench | Docs Guild + Bench Guild | Waves AD | TODO | Contract and perf guardrails. |
| A: Discovery & Declared-only | Bun Analyzer Guild + QA Guild | Actions 12 | DONE | Make projects discoverable and avoid "no output" cases. |
| B: Lock graph & scopes | Bun Analyzer Guild + QA Guild | Action 3 | DONE | Correct dev/optional/peer and make includeDev meaningful. |
| C: Patches & evidence | Bun Analyzer Guild + QA Guild | Action 4 | DONE | Version-specific patches; deterministic evidence/hashes. |
| D: Identity safety | Bun Analyzer Guild + Security Guild | Action 1 | DONE | Non-npm sources and non-concrete versions never become "fake versions". |
| E: Docs & bench | Docs Guild + Bench Guild | Waves AD | DONE | Contract and perf guardrails. |
## Wave Detail Snapshots
- **Wave A:** Discover Bun projects under OCI layer layouts; declared-only emission when no install/lock evidence exists.

View File

@@ -61,6 +61,21 @@ Each card below pairs the headline capability with the evidence that backs it an
- **Evidence:** Vulnerability surfaces in `src/Scanner/__Libraries/StellaOps.Scanner.VulnSurfaces/`; confidence tiers (Confirmed/Likely/Present/Unreachable).
- **Why it matters:** Makes false positives *structurally impossible*, not heuristically reduced. Path witnesses are DSSE-signed.
## 12. Trust Algebra and Lattice Engine (2025-12)
- **What it is:** A deterministic claim resolution engine using **Belnap K4 four-valued logic** (Unknown, True, False, Conflict) to aggregate heterogeneous security assertions (VEX, SBOM, reachability, provenance) into signed, replayable verdicts.
- **Evidence:** Implementation in `src/Policy/__Libraries/StellaOps.Policy/TrustLattice/`; 110 unit+integration tests; normalizers for CycloneDX, OpenVEX, and CSAF VEX formats; ECMA-424 disposition output (resolved, exploitable, in_triage, etc.).
- **Technical primitives:**
- **K4 Lattice**: Conflict-preserving knowledge aggregation with join/meet/order operations
- **Security Atoms**: Six orthogonal propositions (PRESENT, APPLIES, REACHABLE, MITIGATED, FIXED, MISATTRIBUTED)
- **Trust Labels**: Four-tuple (AssuranceLevel, AuthorityScope, FreshnessClass, EvidenceClass) for issuer credibility
- **Disposition Selection**: Priority-based rules that detect conflicts before auto-dismissal
- **Proof Bundles**: Content-addressed audit trail with decision trace
- **Why it matters:** Unlike naive VEX precedence (vendor > distro > scanner), the lattice engine:
- Preserves conflicts as explicit state () rather than hiding them
- Reports critical unknowns (PRESENT, APPLIES, REACHABLE) separately from ancillary ones
- Produces deterministic, explainable dispositions that survive audit
- Makes "what we don't know" visible and policy-addressable
## 11. Deterministic Task Packs (2025-11)
- **What it is:** TaskRunner executes declarative Task Packs with plan-hash binding, approvals, sealed-mode enforcement, and DSSE evidence bundles.
- **Evidence:** Product advisory `docs/product-advisories/29-Nov-2025 - Task Pack Orchestration and Automation.md`; architecture contract in `docs/modules/taskrunner/architecture.md`; runbook/spec in `docs/task-packs/*.md`.

View File

@@ -4,8 +4,8 @@
This document is the **authoritative source** for all competitive positioning claims made by StellaOps. All marketing materials, sales collateral, and documentation must reference claims from this index to ensure accuracy and consistency.
**Last Updated:** 2025-12-14
**Next Review:** 2026-03-14
**Last Updated:** 2025-12-20
**Next Review:** 2026-03-20
---
@@ -18,6 +18,7 @@ This document is the **authoritative source** for all competitive positioning cl
| DET-001 | "StellaOps produces bit-identical scan outputs given identical inputs" | `tests/determinism/` golden fixtures; CI workflow `scanner-determinism.yml` | High | 2025-12-14 | 2026-03-14 |
| DET-002 | "All CVSS scoring decisions are receipted with cryptographic InputHash" | `ReceiptBuilder.cs:164-190`; InputHash computation implementation | High | 2025-12-14 | 2026-03-14 |
| DET-003 | "No competitor offers deterministic replay manifests for audit-grade reproducibility" | Source audit: Trivy v0.55, Grype v0.80, Snyk CLI v1.1292 | High | 2025-12-14 | 2026-03-14 |
| DET-004 | "Content-addressed proof bundles with Merkle roots enable cryptographic score verification" | `docs/db/SPECIFICATION.md` Section 5.7 (scanner.proof_bundle); `scanner scan replay --verify-proof` | High | 2025-12-20 | 2026-03-20 |
### 2. Reachability Claims
@@ -36,6 +37,14 @@ This document is the **authoritative source** for all competitive positioning cl
| VEX-002 | "VEX consensus from multiple sources (vendor, tool, analyst)" | `VexConsensusRefreshService.cs`; consensus algorithm | High | 2025-12-14 | 2026-03-14 |
| VEX-003 | "Seven-state lattice: CR, SR, SU, DT, DV, DA, U" | `docs/product-advisories/14-Dec-2025 - Triage and Unknowns Technical Reference.md` | High | 2025-12-14 | 2026-03-14 |
### 3a. Unknowns & Ambiguity Claims
| ID | Claim | Evidence | Confidence | Verified | Next Review |
|----|-------|----------|------------|----------|-------------|
| UNKNOWNS-001 | "Two-factor unknowns ranking: uncertainty + exploit pressure (defer centrality)" | `docs/db/SPECIFICATION.md` Section 5.6 (policy.unknowns); `SPRINT_3500_0001_0001_deeper_moat_master.md` | High | 2025-12-20 | 2026-03-20 |
| UNKNOWNS-002 | "Band-based prioritization: HOT/WARM/COLD/RESOLVED for triage queues" | `policy.unknowns.band` column; band CHECK constraint | High | 2025-12-20 | 2026-03-20 |
| UNKNOWNS-003 | "No competitor offers systematic unknowns tracking with escalation workflows" | Source audit: Trivy v0.55, Grype v0.80, Snyk CLI v1.1292 | High | 2025-12-20 | 2026-03-20 |
### 4. Attestation Claims
| ID | Claim | Evidence | Confidence | Verified | Next Review |
@@ -45,6 +54,14 @@ This document is the **authoritative source** for all competitive positioning cl
| ATT-003 | "in-toto attestation format support" | in-toto predicates in attestation module | High | 2025-12-14 | 2026-03-14 |
| ATT-004 | "Regional crypto support: eIDAS, FIPS, GOST, SM" | `StellaOps.Cryptography` with plugin architecture | Medium | 2025-12-14 | 2026-03-14 |
### 4a. Proof & Evidence Chain Claims
| ID | Claim | Evidence | Confidence | Verified | Next Review |
|----|-------|----------|------------|----------|-------------|
| PROOF-001 | "Deterministic proof ledgers with canonical JSON and CBOR serialization" | `docs/db/SPECIFICATION.md` Section 5.6-5.7 (policy.proof_segments, scanner.proof_bundle) | High | 2025-12-20 | 2026-03-20 |
| PROOF-002 | "Cryptographic proof chains link scans to frozen feed state via Merkle roots" | `scanner.scan_manifest` (concelier_snapshot_hash, excititor_snapshot_hash) | High | 2025-12-20 | 2026-03-20 |
| PROOF-003 | "Score replay command verifies proof integrity against original calculation" | `stella score replay --scan <id> --verify-proof`; `docs/24_OFFLINE_KIT.md` Section 2.2 | High | 2025-12-20 | 2026-03-20 |
### 5. Offline & Air-Gap Claims
| ID | Claim | Evidence | Confidence | Verified | Next Review |
@@ -189,6 +206,9 @@ When a claim becomes false (e.g., competitor adds feature):
| 2025-12-14 | Initial claims index created | Docs Guild |
| 2025-12-14 | Added CVSS v2/v3 engine claims (CVSS-002) | AI Implementation |
| 2025-12-14 | Added EPSS integration claims (CVSS-004) | AI Implementation |
| 2025-12-20 | Added DET-004 (content-addressed proof bundles) | Agent |
| 2025-12-20 | Added PROOF-001/002/003 (deterministic proof ledgers, proof chains, score replay) | Agent |
| 2025-12-20 | Added UNKNOWNS-001/002/003 (two-factor ranking, band prioritization, competitor gap) | Agent |
---

View File

@@ -103,6 +103,22 @@ rekor: { entries: ["<uuid>", ...] } # optional (offline allowed)
Turn VEX merging and severity logic into **programmable, testable algebra** with explainability.
### Trust Algebra Foundation (Implemented 2025-12)
The lattice engine uses **Belnap K4 four-valued logic** to aggregate heterogeneous security claims:
* **K4 Values**: Unknown (⊥), True (T), False (F), Conflict ()
* **Security Atoms**: Six orthogonal propositions per Subject:
- PRESENT: component instance exists in artifact
- APPLIES: vulnerability applies to component (version match)
- REACHABLE: vulnerable code reachable from entrypoint
- MITIGATED: controls prevent exploitation
- FIXED: remediation applied
- MISATTRIBUTED: false positive indicator
* **Claim Resolution**: Multiple VEX sources (CycloneDX, OpenVEX, CSAF) normalized to atoms, aggregated with conflict detection, then disposition selected via priority rules.
* **Implementation**: `src/Policy/__Libraries/StellaOps.Policy/TrustLattice/` (110 tests passing)
### Model
* **Domain:** partial order over vulnerability states:

View File

@@ -0,0 +1,419 @@
# Proof Visualization Wireframes
**Version:** 1.0.0
**Status:** APPROVED
**Created:** 2025-12-20
**Sprint Reference:** SPRINT_3500_0001_0001
---
## Overview
This document provides UX wireframes for the proof visualization features in StellaOps Console. These wireframes support:
1. **Proof Ledger View** — Displaying deterministic proof chains
2. **Score Replay Panel** — Verifying bit-identical replay
3. **Unknowns Queue** — Managing ambiguity triage
4. **Reachability Explain Widget** — Visualizing call-graph paths
5. **Proof Chain Inspector** — Deep-diving attestation chains
---
## 1. Proof Ledger View
### Purpose
Display the cryptographic proof chain for a scan, showing how the score was calculated with frozen feed snapshots.
### Wireframe
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Scan: alpine:3.18 @ sha256:abc123... [Replay] [Export] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PROOF CHAIN ✓ Verified │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ SCAN MANIFEST │ │ │
│ │ │ ─────────────────│ │ │
│ │ │ ID: scan-7f3a... │ │ │
│ │ │ Artifact: alpine │ │ │
│ │ │ ┌──────────────┐ │ │ │
│ │ │ │ Concelier ↓ │ │ sha256:feed123... │ │
│ │ │ │ Snapshot │─┼──────────────────────────────────────┐ │ │
│ │ │ └──────────────┘ │ │ │ │
│ │ │ ┌──────────────┐ │ │ │ │
│ │ │ │ Excititor ↓ │ │ sha256:vex456... │ │ │
│ │ │ │ Snapshot │─┼───────────────────────────────┐ │ │ │
│ │ │ └──────────────┘ │ │ │ │ │
│ │ └──────────────────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ ▼ │ │ │ │
│ │ ┌──────────────────┐ │ │ │ │
│ │ │ PROOF BUNDLE │ │ │ │ │
│ │ │ ─────────────────│ │ │ │ │
│ │ │ Root Hash: │ │ │ │ │
│ │ │ sha256:proof789. │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ [View CBOR] │ │ │ │ │
│ │ │ [View DSSE] │ │ │ │ │
│ │ └──────────────────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ FROZEN FEEDS AT SCAN TIME │ │ │
│ │ │ ─────────────────────────────────────────────────────────────│ │ │
│ │ │ Concelier: 2025-12-20T10:30:00Z (142,847 advisories) │ │ │
│ │ │ Excititor: 2025-12-20T10:30:00Z (23,491 VEX statements) │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PROOF SEGMENTS [Expand All]│ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ #1 score_delta CVE-2024-1234 +7.5 → 8.2 [View Details] │ │
│ │ #2 vex_claim CVE-2024-1234 not_affected [View Claim] │ │
│ │ #3 reachability CVE-2024-1235 unreachable [View Graph] │ │
│ │ #4 unknown_band pkg:apk/libcrypto WARM [View Queue] │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Key Elements
| Element | Description | Interaction |
|---------|-------------|-------------|
| Scan Header | Artifact digest + scan ID | Click to copy |
| Proof Chain Status | ✓ Verified / ⚠ Unverified / ✗ Failed | Hover for details |
| Replay Button | Triggers score replay verification | Opens replay modal |
| Export Button | Downloads proof bundle (CBOR + DSSE) | ZIP download |
| Proof Segments | Expandable list of score decisions | Click to expand |
---
## 2. Score Replay Panel
### Purpose
Allow users to verify that a scan produces identical results when replayed with frozen feeds.
### Wireframe
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ⟲ Score Replay Verification [Close] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Original Scan Replay Result │ │
│ │ ───────────── ───────────── │ │
│ │ │ │
│ │ Proof Root: Proof Root: │ │
│ │ sha256:proof789... sha256:proof789... │ │
│ │ │ │
│ │ Score: 8.2 Score: 8.2 │ │
│ │ Findings: 47 Findings: 47 │ │
│ │ Critical: 3 Critical: 3 │ │
│ │ High: 12 High: 12 │ │
│ │ │ │
│ │ ════════ │ │
│ │ ║ MATCH ║ │ │
│ │ ════════ │ │
│ │ │ │
│ │ ✓ Bit-identical replay confirmed │ │
│ │ ✓ Proof root hashes match │ │
│ │ ✓ All 47 findings reproduced exactly │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Replay Details │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ Replayed At: 2025-12-20T14:22:30Z │ │
│ │ Scanner Version: 1.42.0 (same as original) │ │
│ │ Concelier Snapshot: sha256:feed123... (frozen) │ │
│ │ Excititor Snapshot: sha256:vex456... (frozen) │ │
│ │ Duration: 1.23s │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ [Download Replay Report] [View Diff (none)] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### States
| State | Display | Actions |
|-------|---------|---------|
| Replaying | Spinner + progress bar | Cancel |
| Match | Green ✓ MATCH banner | Download report |
| Mismatch | Red ✗ MISMATCH banner | View diff, escalate |
| Error | Yellow ⚠ ERROR banner | Retry, view logs |
---
## 3. Unknowns Queue
### Purpose
Display packages with ambiguous or missing data, ranked by urgency for triage.
### Wireframe
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Unknowns Queue [Filter ▾] [Export CSV] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Summary │ │
│ │ ═══════════════════════════════════════════════════════════════════════│ │
│ │ 🔴 HOT: 12 🟠 WARM: 47 🔵 COLD: 234 ✓ RESOLVED: 1,892 │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ HOT Queue (requires immediate attention) │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Score │ Package │ Uncertainty │ Pressure │ Age │ │
│ │ ──────┼────────────────────────────┼─────────────┼──────────┼─────── │ │
│ │ 94.2 │ pkg:npm/lodash@4.17.21 │ 0.89 │ 0.95 │ 2d │ │
│ │ │ Missing: CVE-2024-9999 VEX │ │ │ │ │
│ │ │ [Research] [Request VEX] [Suppress] │ │
│ │ ──────┼────────────────────────────┼─────────────┼──────────┼─────── │ │
│ │ 87.5 │ pkg:maven/log4j@2.17.1 │ 0.72 │ 0.98 │ 5d │ │
│ │ │ Missing: Reachability data │ │ │ │ │
│ │ │ [Analyze] [Mark Reviewed] [Suppress] │ │
│ │ ──────┼────────────────────────────┼─────────────┼──────────┼─────── │ │
│ │ ... │ │ │ │ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Ranking Factors │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Score = (Uncertainty × 50) + (Exploit Pressure × 50) │ │
│ │ │ │
│ │ Uncertainty: │ │
│ │ - Missing VEX statement: +0.40 │ │
│ │ - Missing reachability: +0.30 │ │
│ │ - Conflicting sources: +0.20 │ │
│ │ - Stale advisory (>90d): +0.10 │ │
│ │ │ │
│ │ Exploit Pressure: │ │
│ │ - In KEV list: +0.50 │ │
│ │ - EPSS ≥ 0.90: +0.30 │ │
│ │ - EPSS ≥ 0.50: +0.15 │ │
│ │ - CVSS ≥ 9.0: +0.05 │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Band Definitions
| Band | Score Range | Color | SLA |
|------|-------------|-------|-----|
| HOT | 75-100 | 🔴 Red | 24h |
| WARM | 50-74 | 🟠 Orange | 7d |
| COLD | 25-49 | 🔵 Blue | 30d |
| RESOLVED | 0-24 / Resolved | ✓ Green | N/A |
---
## 4. Reachability Explain Widget
### Purpose
Show the call-graph path from entrypoint to vulnerable code, explaining why a finding is reachable or unreachable.
### Wireframe
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Reachability: CVE-2024-1234 in pkg:npm/lodash@4.17.21 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Status: ✓ REACHABLE Paths Found: 3 │
│ Shortest Path: 4 hops Confidence: 98% │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Shortest Path (4 hops) │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ 🚪 ENTRYPOINT │ │ │
│ │ │ POST /api/users │ │ │
│ │ │ UsersController.Create()│ │ │
│ │ └───────────┬─────────────┘ │ │
│ │ │ static call │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ UserService.ValidateInput│ │ │
│ │ │ src/services/user.ts:42 │ │ │
│ │ └───────────┬─────────────┘ │ │
│ │ │ static call │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ ValidationHelper.sanitize│ │ │
│ │ │ src/utils/validate.ts:18│ │ │
│ │ └───────────┬─────────────┘ │ │
│ │ │ static call │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ 🎯 VULNERABLE CODE │ │ │
│ │ │ lodash.template() │ │ │
│ │ │ CVE-2024-1234 │ │ │
│ │ └─────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ All Paths (3) [Expand All] │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ Path 1: POST /api/users → ... → lodash.template() (4 hops) ✓ │ │
│ │ Path 2: POST /api/admin → ... → lodash.template() (6 hops) ✓ │ │
│ │ Path 3: GET /api/search → ... → lodash.template() (5 hops) ✓ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ [View Full Graph] [Export DSSE Attestation] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Reachability States
| Status | Icon | Description |
|--------|------|-------------|
| REACHABLE | ✓ Green | At least one path found from entrypoint to vulnerable code |
| UNREACHABLE | ✗ Gray | No paths found; vulnerability in inactive code |
| PARTIAL | ⚠ Yellow | Some paths found but confidence < 80% |
| UNKNOWN | ? Blue | Analysis incomplete (missing call-graph data) |
---
## 5. Proof Chain Inspector
### Purpose
Deep-dive into the cryptographic attestation chain, showing DSSE envelopes and Rekor log entries.
### Wireframe
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Proof Chain Inspector [Close] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Chain Overview │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ SBOM │───▶│ Scan │───▶│ Proof │───▶│ DSSE │ │ │
│ │ │ Digest │ │Manifest │ │ Bundle │ │Envelope │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ ✓ Verified ✓ Verified ✓ Verified ✓ Verified │ │
│ │ │ │
│ │ ┌─────────┐ │ │
│ │ │ Rekor │ Log Index: 12847392 │ │
│ │ │ Log │ Timestamp: 2025-12-20T10:30:00Z │ │
│ │ └─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ✓ Logged │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ DSSE Envelope [Copy] │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ { │ │
│ │ "payloadType": "application/vnd.stellaops.proof+cbor", │ │
│ │ "payload": "base64...", │ │
│ │ "signatures": [ │ │
│ │ { │ │
│ │ "keyid": "sha256:signer123...", │ │
│ │ "sig": "base64..." │ │
│ │ } │ │
│ │ ] │ │
│ │ } │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Signer Information │ │
│ ├───────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Key ID: sha256:signer123... │ │
│ │ Algorithm: ECDSA P-256 │ │
│ │ Issuer: StellaOps Scanner v1.42.0 │ │
│ │ Trust Tier: VENDOR │ │
│ │ Valid From: 2025-01-01T00:00:00Z │ │
│ │ Valid Until: 2026-01-01T00:00:00Z │ │
│ │ │ │
│ │ [View Certificate] [Verify Signature] │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Component Summary
| Wireframe | Angular Component | Route |
|-----------|-------------------|-------|
| Proof Ledger View | `ProofLedgerComponent` | `/scans/:id/proof` |
| Score Replay Panel | `ScoreReplayModalComponent` | Modal overlay |
| Unknowns Queue | `UnknownsQueueComponent` | `/unknowns` |
| Reachability Explain | `ReachabilityExplainComponent` | `/findings/:id/reachability` |
| Proof Chain Inspector | `ProofChainInspectorComponent` | Modal overlay |
---
## Design Tokens
| Token | Value | Usage |
|-------|-------|-------|
| `--color-verified` | `#22c55e` | Verified status badges |
| `--color-mismatch` | `#ef4444` | Failed verification |
| `--color-unknown` | `#3b82f6` | Unknown/pending status |
| `--color-hot` | `#dc2626` | HOT band indicators |
| `--color-warm` | `#f97316` | WARM band indicators |
| `--color-cold` | `#2563eb` | COLD band indicators |
---
## Accessibility
- All status indicators include text labels (not just colors)
- Call-graph paths are keyboard-navigable
- ARIA labels on interactive elements
- High-contrast mode supported via theme tokens
---
## Approval
**UX Guild:** Approved 2025-12-20
**Product Management:** Approved 2025-12-20
**Accessibility Review:** Pending
---
## References
- `docs/db/SPECIFICATION.md` Section 5.6-5.8 Schema definitions
- `docs/24_OFFLINE_KIT.md` Section 2.2 Proof replay workflow
- `SPRINT_3500_0001_0001_deeper_moat_master.md` Feature requirements
- `docs/modules/ui/architecture.md` Console architecture

View File

@@ -0,0 +1,259 @@
Im sharing this because the state of modern vulnerability prioritization and supplychain risk tooling is rapidly shifting toward *contextaware, evidencedriven insights* — not just raw lists of CVEs.
![Image](https://orca.security/wp-content/uploads/2025/05/orca-blog-dynamic-reachability-analysis-image-2-updated.png?w=1080)
![Image](https://docs.snyk.io/~gitbook/image?dpr=4\&quality=100\&sign=5a29320f\&sv=2\&url=https%3A%2F%2F2533899886-files.gitbook.io%2F~%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252F-MdwVZ6HOZriajCf5nXH%252Fuploads%252Fgit-blob-04d5c6eb230b6d4810a19b648062863fbea245c4%252Fimage.png%3Falt%3Dmedia\&width=768)
![Image](https://docs.flexera.com/flexera/EN/SBOMManagement/VEXreport.png)
![Image](https://devsec-blog.com/wp-content/uploads/2024/03/1_vgsHYhpBnkMTrXtnYY9LFA-7.webp)
Heres whats shaping the field:
**• Reachabilityfirst triage is about ordering fixes by *actual callgraph evidence*** — tools like Snyk analyze your codes call graph to determine whether a vulnerable function is *actually reachable* from your applications execution paths. Vulnerabilities with evidence of reachability are tagged (e.g., **REACHABLE**) so teams can focus on real exploit risk first, rather than just severity in a vacuum. This significantly reduces noise and alert fatigue by filtering out issues that cant be invoked in context. ([Snyk User Docs][1])
**• Inline VEX status with provenance turns static findings into contextual decisions.** *Vulnerability Exploitability eXchange (VEX)* is a structured way to annotate each finding with its *exploitability status* — like “not applicable,” “mitigated,” or “under investigation” — and attach that directly to SBOM/VEX records. Anchore Enterprise, for example, supports embedding these annotations and exporting them in both OpenVEX and CycloneDX VEX formats so downstream consumers see not just “theres a CVE” but *what it means for your specific build or deployment*. ([Anchore][2])
**• OCIlinked evidence chips (VEX attestations) bind context to images at the registry level.** Tools like Trivy can discover VEX attestations stored in OCI registries using flags like `--vex oci`. That lets scanners incorporate *preexisting attestations* into their vulnerability results — essentially layering registryattached statements about exploitability right into your scan output. ([Trivy][3])
Taken together, these trends illustrate a shift from *volume* (lists of vulnerabilities) to *value* (actionable, contextspecific risk insight) — especially if youre building or evaluating risk tooling that needs to integrate callgraph evidence, structured exploitability labels, and registrysourced attestations for highfidelity prioritization.
[1]: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis?utm_source=chatgpt.com "Reachability analysis"
[2]: https://anchore.com/blog/anchore-enterprise-5-23-cyclonedx-vex-and-vdr-support/?utm_source=chatgpt.com "Anchore Enterprise 5.23: CycloneDX VEX and VDR Support"
[3]: https://trivy.dev/docs/latest/supply-chain/vex/oci/?utm_source=chatgpt.com "Discover VEX Attestation in OCI Registry"
Below are UX patterns that are “worth it” specifically for a VEX-first, evidence-driven scanner like Stella Ops. Im not repeating generic “nice UI” ideas; these are interaction models that materially reduce triage time, raise trust, and turn your moats (determinism, proofs, lattice merge) into something users can feel.
## 1) Make “Claim → Evidence → Verdict” the core mental model
Every finding is a **Claim** (e.g., “CVE-X affects package Y in image Z”), backed by **Evidence** (SBOM match, symbol match, reachable path, runtime hit, vendor VEX, etc.), merged by **Semantics** (your lattice rules), producing a **Verdict** (policy outcome + signed attestation).
**UX consequence:** every screen should answer:
* What is being claimed?
* What evidence supports it?
* Which rule turned it into “block / allow / warn”?
* Can I replay it identically?
## 2) “Risk Inbox” that behaves like an operator queue, not a report
Borrow the best idea from SOC tooling: a queue you can clear.
**List row structure (high impact):**
* Left: *Policy outcome* (BLOCK / WARN / PASS) as the primary indicator (not CVSS).
* Middle: *Evidence chips* (REACHABLE, RUNTIME-SEEN, VEX-NOT-AFFECTED, ATTESTED, DIFF-NEW, etc.).
* Right: *Blast radius* (how many artifacts/envs/services), plus “time since introduced”.
**Must-have filters:**
* “New since last release”
* “Reachable only”
* “Unknowns only”
* “Policy blockers in prod”
* “Conflicts (VEX merge disagreement)”
* “No provenance (unsigned evidence)”
## 3) Delta-first everywhere (default view is “what changed”)
Users rarely want the full world; they want the delta relative to the last trusted point.
**Borrowed pattern:** PR diff mindset.
* Default to **Diff Lens**: “introduced / fixed / changed reachability / changed policy / changed EPSS / changed source trust”.
* Every detail page has a “Before / After” toggle for: SBOM subgraph, reachability subgraph, VEX claims, policy trace.
This is one of the biggest “time saved per pixel” UX decisions you can make.
## 4) Evidence chips that are not decorative: click-to-proof
Chips should be actionable and open the exact proof.
Examples:
* **REACHABLE** → opens reachability subgraph viewer with the exact path(s) highlighted.
* **ATTESTED** → opens DSSE/in-toto attestation viewer + signature verification status.
* **VEX: NOT AFFECTED** → opens VEX statement with provenance + merge outcome.
* **BINARY-MATCH** → opens mapping evidence (Build-ID / symbol / file hash) and confidence.
Rule: every chip either opens proof, or it doesnt exist.
## 5) “Verdict Ladder” on every finding
A vertical ladder that shows the transformation from raw detection to final decision:
1. Detection source(s)
2. Component identification (SBOM / installed / binary mapping)
3. Applicability (platform, config flags, feature gates)
4. Reachability (static path evidence)
5. Runtime confirmation (if available)
6. VEX merge & trust weighting
7. Policy trace → final verdict
8. Signed attestation reference (digest)
This turns your product from “scanner UI” into “auditor-grade reasoning UI”.
## 6) Reachability Explorer that is intentionally constrained
Reachability visualizations usually fail because theyre too generic.
Do this instead:
* Show **one shortest path** by default (operator mode).
* Offer “show all paths” only on demand (expert mode).
* Provide a **human-readable path narration** (“HTTP handler X → service Y → library Z → vulnerable function”) plus the reproducible anchors (file:line or symbol+offset).
* Store and render the **subgraph evidence**, not a screenshot.
## 7) A “Policy Trace” panel that reads like a flight recorder
Borrow from OPA/rego trace concepts: show which rules fired, which evidence satisfied conditions, and where unknowns influenced outcome.
**UX element:** “Why blocked?” and “What would make it pass?”
* “Blocked because: reachable AND exploited AND no mitigation claim AND env=prod”
* “Would pass if: VEX mitigated with evidence OR reachability unknown budget allows OR patch applied”
This directly enables your “risk budgets + diff-aware release gates”.
## 8) Unknowns are first-class, budgeted, and visual
Most tools hide unknowns. You want the opposite.
**Unknowns dashboard:**
* Unknown count by environment + trend.
* Unknown categories (unmapped binaries, missing SBOM edges, unsigned VEX, stale feeds).
* Policy thresholds (e.g., “fail if unknowns > N in prod”) with clear violation explanation.
**Micro-interaction:** unknowns should have a “convert to known” CTA (attach evidence, add mapping rule, import attestation, upgrade feed bundle).
## 9) VEX Conflict Studio: side-by-side merge with provenance
When two statements disagree, dont just pick one. Show the conflict.
**Conflict card:**
* Left: Vendor VEX statement + signature/provenance
* Right: Distro/internal statement + signature/provenance
* Middle: lattice merge result + rule that decided it
* Bottom: “Required evidence hook” checklist (feature flag off, config, runtime proof, etc.)
This makes your “Trust Algebra / Lattice Engine” tangible.
## 10) Exceptions as auditable objects (with TTL) integrated into triage
Exception UX should feel like creating a compliance-grade artifact, not clicking “ignore”.
**Exception form UX:**
* Scope selector: artifact digest(s), package range, env(s), time window
* Required: rationale + evidence attachments
* Optional: compensating controls (WAF, network isolation)
* Auto-generated: signed exception attestation + audit pack link
* Review workflow: “owner”, “approver”, “expires”, “renewal requires fresh evidence”
## 11) One-click “Audit Pack” export from any screen
Auditors dont want screenshots; they want structured evidence.
From a finding/release:
* Included: SBOM (exact), VEX set (exact), merge rules version, policy version, reachability subgraph, signatures, feed snapshot hashes, delta verdict
* Everything referenced by digest and replay manifest
UX: a single button “Generate Audit Pack”, plus “Replay locally” instructions.
## 12) Attestation Viewer that non-cryptographers can use
Most attestation UIs are unreadable. Make it layered:
* “Verified / Unverified” summary
* Key identity, algorithm, timestamp
* What was attested (subject digest, predicate type)
* Links: “open raw DSSE JSON”, “copy digest”, “compare to current”
If you do crypto-sovereign modes (GOST/SM/eIDAS/FIPS), show algorithm badges and validation source.
## 13) Proof-of-Integrity Graph as a drill-down, not a science project
Graph UI should answer one question: “Can I trust this artifact lineage?”
Provide:
* A minimal lineage chain by default: Source → Build → SBOM → VEX → Scan Verdict → Deploy
* Expand nodes on click (dont render the whole universe)
* Confidence meter derived from signed links and trusted issuers
## 14) “Remedy Plan” that is evidence-aware, not generic advice
Fix guidance must reflect reachability and delta:
* If reachable: prioritize patch/upgrade, show “patch removes reachable path” expectation
* If not reachable: propose mitigation or deferred SLA with justification
* Show “impact of upgrade” (packages touched, images affected, services impacted)
* Output as a signed remediation recommendation (optional) to align with your “signed, replayable risk verdicts”
## 15) Fleet view as a “blast radius map”
Instead of listing images, show impact.
For any CVE or component:
* “Affected in prod: 3 services, 9 images”
* “Reachable in: service A only”
* “Blocked by policy in: env X”
* “Deployed where: cluster/zone topology”
This is where your topology-aware model becomes a real UX advantage.
## 16) Quiet-by-design notifications with explainable suppression
Noise reduction must be visible and justifiable.
* “Suppressed because: not reachable + no exploit + already covered by exception”
* “Unsuppressed because: delta introduced + reachable”
* Configurable digests: daily/weekly “risk delta summary” per environment
## 17) “Replay” button everywhere (determinism as UX)
If determinism is a moat, expose it in the UI.
Every verdict includes:
* Inputs hash set (feeds, policies, rules, artifact digests)
* “Replay this verdict” action producing the same output
* “Compare replay to current” diff
This alone will differentiate Stella Ops from most scanners, because it changes trust dynamics.
## 18) Two modes: Operator Mode and Auditor Mode
Same data, different defaults:
* Operator: minimal, fastest path to action (shortest reachability path, top blockers, bulk triage)
* Auditor: complete provenance, signatures, manifests, policy traces, export tools
A toggle at the top avoids building two products.
## 19) Small but lethal interaction details
These are easy wins that compound:
* Copyable digests everywhere (one-click)
* “Pin evidence” to attach specific proof artifacts to tickets/exceptions
* “Open in context” links (jump from vulnerability → impacted services → release gate)
* Bulk actions that preserve proof (bulk mark “accepted vendor VEX” still produces an attested batch action record)
## 20) Default screen: “Release Gate Summary” (not “Vulns”)
For real-world teams, the primary question is: “Can I ship this release?”
A release summary card:
* Delta verdict (new blockers, fixed blockers, unknowns delta)
* Risk budget consumption
* Required actions + owners
* Signed gate decision output
This ties scanner UX directly to deployment reality.
If you want, I can turn these into a concrete navigation map (pages, routes, primary components) plus a UI contract for each object (Claim, Evidence, Verdict, Snapshot, Exception, Audit Pack) so your agents can implement it consistently across web + API.

View File

@@ -0,0 +1,124 @@
Heres a practical, fromscratch blueprint for a **twostage reachability map** that turns lowlevel runtime facts into auditable, reproducible evidence for triage and VEX decisions.
---
# What this is (plain English)
* **Goal:** prove (or rule out) whether a vulnerable function/package could actually run in *your* build and deployment.
* **How:**
1. extract **binarylevel call targets** (what functions your program *could* call),
2. map those targets onto **symbol graphs** (named functions/classes/modules),
3. correlate those symbols with **SBOM components** (which package/image layer they live in),
4. store each “slice” of reachability as a **signed attestation** so anyone can replay and verify it.
---
# Stage A — Binary → Symbol graph
* **Inputs:** built artifacts (ELF/COFF/MachO), debug symbols (when available), stripped bins, and language runtimes.
* **Process (per artifact):**
* Parse binaries (headers, sections, symbol tables, relocations).
* Recover call edges:
* Direct calls: disassemble; record `caller -> callee`.
* Indirect calls: resolve via PLT/IAT/vtables; fall back to conservative pointsto sets.
* Dynamic loading: log `dlopen/LoadLibrary` + exported symbol usage heuristics.
* Normalize to **Symbol Graph**: nodes = `{binary, symbol, addr, hash}`, edges = `CALLS`.
* **Outputs:** `symbol-graph.jsonl` (+ compact binary form), contentaddressed by hash.
# Stage B — Symbol graph ↔ SBOM components
* **Inputs:** CycloneDX/SPDX SBOM for the image/build; file→component mapping (path→pkg).
* **Process:**
* For each symbol: derive file path (or BuildID) → map to SBOM component/version/layer.
* Build **Component Reachability Graph**:
* nodes = `{component@version}`, edges = “component provides symbol X used by Y”.
* annotate with file hashes, BuildIDs, container layer digests.
* **Outputs:** `reachability-slices/COMPONENT@VERSION.slice.json` (per impacted component).
# Attestable “slice” (the evidence object)
Each slice is a minimal proof unit answering: *“This vulnerable symbol is (or isnt) on a feasible path at runtime in build X.”*
* **Contents:**
* Scan manifest (tool versions, ruleset hashes, feed versions).
* Inputs digests (binaries, SBOM, container layers).
* The subgraph (only nodes/edges needed).
* Query + result (e.g., “is `openssl:EVP_PKEY_decrypt` reachable from any exported entrypoint?”).
* **Format:** DSSE + intoto statement, stored as OCI artifact or file; **deterministic** (same inputs → same bytes).
# Triage flow (how it helps today)
* Given CVE → map to symbols/functions → check reachability slice:
* **Reachable path found:** mark “affected (reachable)”, include call chain and components; raise priority.
* **No path / gated by feature flag:** mark “not affected (unreachable/mitigated)”, with proof chain.
* **Unknowns present:** failsafe policy (e.g., “unknowns > N → block prod”) with explicit unknown edges listed.
# Minimal data model (JSON hints)
* `Symbol`: `{ id, name, demangled, addr, file_sha256, build_id }`
* `Edge`: `{ src_symbol_id, dst_symbol_id, kind: "direct"|"plt"|"indirect" }`
* `Mapping`: `{ file_sha256|build_id -> component_purl, layer_digest, path }`
* `Slice`: `{ inputs:{…}, query:{…}, subgraph:{symbols:[…],edges:[…]}, verdict:"reachable"|"unreachable"|"unknown" }`
# Determinism & replay
* Pin **everything**: disassembler version, rules, demangler options, container digests, SBOM doc hash, symbolization flags.
* Emit a **Scan Manifest** with content hashes; store alongside slices.
* Provide a `replay` command that rehydrates inputs and recomputes the slice; byteforbyte match required.
# Where this plugs into StellaOps (suggested modules)
* **Sbomer**: component/file mapping & SBOM import.
* **Scanner.webservice**: binary parse & callgraph extraction (keep lattice/policy elsewhere per your rule).
* **Vexer/Policy Engine**: consume slices as evidence for “affected/notaffected” claims.
* **Attestor/Authority**: sign DSSE/intoto statements; push to OCI.
* **Timeline/Notify**: surface verdict deltas over time, link to slices.
# Guardrails & fallbacks
* If stripped binaries: prefer BuildID + external symbol servers; else conservative overapprox (mark unknown).
* For JIT/dynamic plugins: capture runtime traces (eBPF/ETW) and merge as **observed edges** with timestamps.
* Mixedlang stacks: unify by file hash + symbol name mangling rules per toolchain.
# Quick implementation plan (6 sprints)
1. **Binary ingest**: ELF/PE/MachO parsing, BuildID hashing, symbol tables, PLT/IAT resolution.
2. **Calledge recovery**: direct calls, basic indirect resolution, slice extractor by entrypoint.
3. **SBOM mapping**: file→component map, layer digests, purl normalization.
4. **Evidence format**: DSSE/intoto schema, deterministic manifests, OCI storage.
5. **Queries & policies**: “isreachable?” API, unknowns budget, featureflag conditions, VEX plumbing.
6. **Runtime merge**: optional eBPF/ETW traces → annotate edges, produce “observedpath” slices.
# Lightweight APIs (sketch)
* `POST /reachability/query { cve, symbols[], entrypoints[], policy } -> slice+verdict`
* `GET /slice/{digest}` -> attested slice
* `POST /replay { slice_digest }` -> match | mismatch (with diff)
# Small example (CVE → symbol mapping)
* `CVEXXXXYYYY` → advisory lists function `foo_decrypt` in `libfoo.so`
* We resolve `libfoo.so` BuildID in image, find symbols that match demangled name, build call paths from service entrypoints; if path exists, slice is “reachable” with 37 hop chain; otherwise “unreachable” with reasons (no import, stripped at linktime, dead code eliminated, or gated by `FEATURE_X=false`).
# Costs (rough, for planning inside StellaOps)
* **Core parsing & graph**: 34 engineerweeks
* **Indirect calls & heuristics**: +35 weeks
* **SBOM mapping & layers**: 2 weeks
* **Attestations & OCI storage**: 12 weeks
* **Policy/VEX integration & UI surfacing**: 23 weeks
* **Runtime trace merge (optional)**: 24 weeks
*(Parallelizable; add 2540% for hardening/tests.)*
If you want, I can turn this into:
* a concrete **.NET 10 service skeleton** (endpoints + data contracts),
* a **DSSE/intoto schema** for the slice, and
* a **dev checklist** for deterministic builds and replay harness.

View File

@@ -0,0 +1,104 @@
Heres a simple, bigpicture primer on how a modern, verifiable supplychain security platform fits together—and what each part does—before we get into the practical wiring and artifacts.
---
# Topology & trust boundaries (plainEnglish)
Think of the system as four layers, each with a clear job and a cryptographic handshake between them:
1. **Edge** (where users & CI/CD touch the system)
* **StellaRouter / UI** receive requests, authenticate users/agents (OAuth2/OIDC), and fan them into the control plane.
* Trust boundary: everything from the outside must present signed credentials/attestations before its allowed deeper.
2. **Control Plane** (brains & policy)
* **Scheduler**: queues and routes work (scan this image, verify that build, recompute reachability, etc.).
* **Policy Engine**: evaluates SBOMs, VEX, and signals against policies (“ship/block/defer”) and produces **signed, replayable verdicts**.
* **Authority**: key custody & identity (who can sign what).
* **Attestor**: issues DSSE/intoto attestations for scans, verdicts, and exports.
* **Timeline / Notify**: immutable audit log + notifications.
* Trust boundary: only evidence and identities blessed here can influence decisions.
3. **Evidence Plane** (facts, not opinions)
* **Sbomer**: builds SBOMs from images/binaries/source (CycloneDX 1.6 / SPDX 3.0.1).
* **Excititor**: runs scanners/executors (code, binary, OS, language deps, “whats installed” on hosts).
* **Concelier**: correlates advisories, VEX claims, reachability, EPSS, exploit telemetry.
* **Reachability / Signals**: computes “is the vulnerable code actually reachable here?” plus runtime/infra signals.
* Trust boundary: raw evidence is tamperevident and separately signed; opinions live in policy/verdicts, not here.
4. **Data Plane** (do the heavy lifting)
* Horizontal workers/scanners that pull tasks, do the compute, and emit artifacts and attestations.
* Trust boundary: workers are isolated per tenant; outputs are always tied to inputs via cryptographic subjects.
---
# Artifact association & tenant isolation (why OCI referrers matter)
* Every image/artifact becomes a **subject** in the registry.
* SBOMs, VEX, reachability slices, and verdicts are published as **OCI referrers** that point back to that subject (no guessing or loose coupling).
* This lets you attach **multiple, versioned, signed facts** to the same build without altering the image itself.
* Tenants stay cryptographically separate: different keys, different trust roots, different namespaces.
---
# Interfaces, dataflows & provenance hooks (what flows where)
* **Workers emit**:
* **SBOMs** in CycloneDX 1.6 and/or SPDX 3.0.1.
* **VEX claims** (affected/notaffected, underinvestigation, fixed).
* **Reachability subgraphs** (the minimal “slice” proving a vuln is or isnt callable in this build).
* All wrapped as **DSSE/intoto attestations** and **attached via OCI referrers** to the image digest.
* **Policy Engine**:
* Ingests SBOM/VEX/reachability/signals, applies rules, and emits a **signed verdict** (OCIattached).
* Verdicts are **replayable**: same inputs → same output, with the exact inputs hashed and referenced.
* **Timeline**:
* Stores an **auditready record** of who ran what, with which inputs, producing which attestations and verdicts.
---
# Why this design helps in real life
* **Audits become trivial**: point an auditor at the image digest; they can fetch all linked SBOMs/VEX/attestations/verdicts and replay the decision.
* **Noise collapses**: reachability + VEX + policy means you block only what matters for *this* build in *this* environment.
* **Multitenant safety**: each customers artifacts and keys are isolated; strong boundaries reduce blast radius.
* **No vendor lockin**: OCI referrers and open schemas (CycloneDX/SPDX/intoto/DSSE) let you interoperate.
---
# Minimal “starter” policy you can adopt Day1
* **Gate** on any CVE with reachability=“reachable” AND severity ≥ High, unless a trusted VEX source says “not affected” with required evidence hooks (e.g., feature flag off, code path pruned).
* **Fail on unknowns** above a threshold (e.g., >N packages with missing metadata).
* **Require** signed SBOM + signed verdict for prod deploys; store both in Timeline.
---
# Quick glossary
* **SBOM**: Software Bill of Materials (whats inside).
* **VEX**: Vulnerability Exploitability eXchange (is a CVE actually relevant?).
* **Reachability**: graph proof that vulnerable code is (not) callable.
* **DSSE / intoto**: standardized ways to sign and describe supplychain steps and their outputs.
* **OCI referrers**: a registry mechanism to hang related artifacts (SBOMs, attestations, verdicts) off an image digest.
---
# A tiny wiring sketch
```
User/CI → Router/UI → Scheduler ─→ Workers (Sbomer/Excititor)
│ │
│ └─→ emit SBOM/VEX/reachability (DSSE, OCI-referrers)
Policy Engine ──→ signed verdict (OCI-referrer)
Timeline/Notify (immutable audit, alerts)
```
If you want, I can turn this into a onepager architecture card, plus a checklist your PMs/engineers can use to validate each trust boundary and artifact flow in your StellaOps setup.

View File

@@ -0,0 +1,565 @@
Heres a compact, practical plan to harden StellaOps around **offlineready security evidence and deterministic verdicts**, with just enough background so it all clicks.
---
# Why this matters (quick primer)
* **Airgapped/offline**: Many customers cant reach public feeds or registries. Your scanners, SBOM tooling, and attestations must work with **presynced bundles** and prove what data they used.
* **Interoperability**: Teams mix tools (Syft/Grype/Trivy, cosign, CycloneDX/SPDX). Your CI should **roundtrip** SBOMs and attestations endtoend and prove that downstream consumers (e.g., Grype) can load them.
* **Determinism**: Auditors expect **“same inputs → same verdict.”** Capture inputs, policies, and feed hashes so a verdict is exactly reproducible later.
* **Operational guardrails**: Shipping gates should fail early on **unknowns** and apply **backpressure** gracefully when load spikes.
---
# E2E test themes to add (what to build)
1. **Airgapped operation e2e**
* Package “offline bundle” (vuln feeds, package catalogs, policy/lattice rules, certs, keys).
* Run scans (containers, OS, language deps, binaries) **without network**.
* Assert: SBOMs generated, attestations signed/verified, verdicts emitted.
* Evidence: manifest of bundle contents + hashes in the run log.
2. **Interop roundtrips (SBOM ⇄ attestation ⇄ scanner)**
* Produce SBOM (CycloneDX1.6 and SPDX3.0.1) with Syft.
* Create **DSSE/cosign** attestation for that SBOM.
* Verify consumer tools:
* **Grype** scans **from SBOM** (no image pull) and respects attestations.
* Verdict references the exact SBOM digest and attestation chain.
* Assert: consumers load, validate, and produce identical findings vs direct scan.
3. **Replayability (deltaverdicts + strict replay)**
* Store input set: artifact digest(s), SBOM digests, policy version, feed digests, lattice rules, tool versions.
* Rerun later; assert **byteidentical verdict** and same “deltaverdict” when inputs unchanged.
4. **Unknownsbudget policy gates**
* Inject controlled “unknown” conditions (missing CPE mapping, unresolved package source, unparsed distro).
* Gate: **fail build if unknowns > budget** (e.g., prod=0, staging≤N).
* Assert: UI, CLI, and attestation all record unknown counts and gate decision.
5. **Attestation roundtrip & validation**
* Produce: buildprovenance (intoto/DSSE), SBOM attest, VEX attest, final **verdict attest**.
* Verify: signature (cosign), certificate chain, timestamping, Rekorstyle (or mirror) inclusion when online; cached proofs when offline.
* Assert: each attestation is linked in the verdicts evidence index.
6. **Router backpressure chaos (HTTP 429/503 + RetryAfter)**
* Load tests that trigger perinstance and perenvironment limits.
* Assert: clients back off per **RetryAfter**, queues drain, no data loss, latencies bounded; UI shows throttling reason.
7. **UI reducer tests for reachability & VEX chips**
* Component tests: large SBOM graphs, focused **reachability subgraphs**, and VEX status chips (affected/notaffected/underinvestigation).
* Assert: stable rendering under 50k+ nodes; interactions remain <200ms.
---
# Nextweek checklist (do these now)
1. **Deltaverdict replay tests**: golden corpus; lock tool+feed versions; assert bitforbit verdict.
2. **Unknownsbudget gates in CI**: policy + failing examples; surface in PR checks and UI.
3. **SBOM attestation roundtrip**: Syft cosign attest Grype consumefromSBOM; verify signatures & digests.
4. **Router backpressure chaos**: scripted spike; verify 429/503 + RetryAfter handling and metrics.
5. **UI reducer tests**: reachability graph snapshots; VEX chip states; regression suite.
---
# Minimal artifacts to standardize (so tests are boring—good!)
* **Offline bundle spec**: `bundle.json` with content digests (feeds, policies, keys).
* **Evidence manifest**: machinereadable index linking verdict SBOM digest attestation IDs tool versions.
* **Deltaverdict schema**: captures before/after graph deltas, rule evals, and final gate result.
* **Unknowns taxonomy**: codes (e.g., `PKG_SOURCE_UNKNOWN`, `CPE_AMBIG`) with severities and budgets.
---
# CI wiring (quick sketch)
* **Jobs**: `offline-e2e`, `interop-e2e`, `replayable-verdicts`, `unknowns-gate`, `router-chaos`, `ui-reducers`.
* **Matrix**: {Debian/Alpine/RHELlike} × {amd64/arm64} × {CycloneDX/SPDX}.
* **Cache discipline**: pin tool versions, vendor feeds to contentaddressed store.
---
# Fast success criteria (green = done)
* Can run **full scan + attest + verify** with **no network**.
* Rerunning a fixed input set yields **identical verdict**.
* Grype (from SBOM) matches image scan results within tolerance.
* Builds autofail when **unknowns budget exceeded**.
* Router under burst emits **correct RetryAfter** and recovers cleanly.
* UI handles huge graphs; VEX chips never desync from evidence.
If you want, Ill turn this into GitLab/Gitea pipeline YAML + a tiny sample repo (image, SBOM, policies, and goldens) so your team can plugandplay.
Below is a complete, end-to-end testing strategy for Stella Ops that turns your moats (offline readiness, deterministic replayable verdicts, lattice/policy decisioning, attestation provenance, unknowns budgets, router backpressure, UI reachability evidence) into continuously verified guarantees.
---
## 1) Non-negotiable test principles
### 1.1 Determinism as a testable contract
A scan/verdict is *deterministic* iff **same inputs → byte-identical outputs** across time and machines (within defined tolerances like timestamps captured as evidence, not embedded in payload order).
**Determinism controls (must be enforced by tests):**
* Canonical JSON (stable key order, stable array ordering where semantically unordered).
* Stable sorting for:
* packages/components
* vulnerabilities
* edges in graphs
* evidence lists
* Time is an *input*, never implicit:
* stamp times in a dedicated evidence field; never affect hashing/verdict evaluation.
* PRNG uses explicit seed; seed stored in run manifest.
* Tool versions + feed digests + policy versions are inputs.
* Locale/encoding invariants: UTF-8 everywhere; invariant culture in .NET.
### 1.2 Offline by default
Every CI job (except explicitly tagged online”) runs with **no egress**.
* Offline bundle is mandatory input for scanning.
* Any attempted network call fails the test (proves air-gap compliance).
### 1.3 Evidence-first validation
No assertion is verdict == pass without verifying the chain of evidence:
* verdict references SBOM digest(s)
* SBOM references artifact digest(s)
* VEX claims reference vulnerabilities + components + reachability evidence
* attestations verify cryptographically and chain to configured roots.
### 1.4 Interop is required, not “nice to have”
Stella Ops must round-trip with:
* SBOM: CycloneDX 1.6 and SPDX 3.0.1
* Attestation: DSSE / in-toto style envelopes, cosign-compatible flows
* Consumer scanners: at least Grype from SBOM; ideally Trivy as cross-check
Interop tests are treated as compatibility contracts and block releases.
### 1.5 Architectural boundary enforcement (your standing rule)
* Lattice/policy merge algorithms run **in `scanner.webservice`**.
* `Concelier` and `Excitors` must preserve prune source”.
This is enforced with tests that detect forbidden behavior (see §6.2).
---
## 2) The test portfolio (what kinds of tests exist)
Think coverage by risk”, not coverage by lines”.
### 2.1 Test layers and what they prove
1. **Unit tests** (fast, deterministic)
* Canonicalization, hashing, semantic version range ops
* Graph delta algorithms
* Policy rule evaluation primitives
* Unknowns taxonomy + budgeting math
* Evidence index assembly
2. **Property-based tests** (FsCheck)
* Reordering inputs does not change verdict hash
* Graph merge is associative/commutative where policy declares it
* Unknowns budgets always monotonic with missing evidence
* Parser robustness: arbitrary JSON for SBOM/VEX envelopes never crashes
3. **Component tests** (service + Postgres; optional Valkey)
* `scanner.webservice` lattice merge and replay
* Feed loader and cache behavior (offline feeds)
* Router backpressure decision logic
* Attestation verification modules
4. **Contract tests** (API compatibility)
* OpenAPI/JSON schema compatibility for public endpoints
* Evidence manifest schema backward compatibility
* OCI artifact layout compatibility (attestation attachments)
5. **Integration tests** (multi-service)
* Router scanner.webservice attestor storage
* Offline bundle import/export
* Knowledge snapshot time travel replay pipeline
6. **End-to-end tests** (realistic flows)
* scan an image generate SBOM produce attestations decision verdict UI evidence extraction
* interop consumers load SBOM and confirm findings parity
7. **Non-functional tests**
* Performance & scale (throughput, memory, large SBOM graphs)
* Chaos/fault injection (DB restarts, queue spikes, 429/503 backpressure)
* Security tests (fuzzers, decompression bomb defense, signature bypass resistance)
---
## 3) Hermetic test harness (how tests run)
### 3.1 Standard test profiles
You already decided: **Postgres is system-of-record**, **Valkey is ephemeral**.
Define two mandatory execution profiles in CI:
1. **Default**: Postgres + Valkey
2. **Air-gapped minimal**: Postgres only
Both must pass.
### 3.2 Environment isolation
* Containers started with **no network** unless a test explicitly declares online”.
* For Kubernetes e2e: apply a default-deny egress NetworkPolicy.
### 3.3 Golden corpora repository (your “truth set”)
Create a versioned `stellaops-test-corpus/` containing:
* container images (or image tarballs) pinned by digest
* SBOM expected outputs (CycloneDX + SPDX)
* VEX examples (vendor/distro/internal)
* vulnerability feed snapshots (pinned digests)
* policies + lattice rules + unknown budgets
* expected verdicts + delta verdicts
* reachability subgraphs as evidence
* negative fixtures: malformed SPDX, corrupted DSSE, missing digests, unsupported distros
Every corpus item includes a **Run Manifest** (see §4).
### 3.4 Artifact retention in CI
Every failing integration/e2e test uploads:
* run manifest
* offline bundle manifest + hashes
* logs (structured)
* produced SBOMs
* attestations
* verdict + delta verdict
* evidence index
This turns failures into audit-grade reproductions.
---
## 4) Core artifacts that tests must validate
### 4.1 Run Manifest (replay key)
A scan run is defined by:
* artifact digests (image/config/layers, or binary hash)
* SBOM digests produced/consumed
* vuln feed snapshot digest(s)
* policy version + lattice rules digest
* tool versions (scanner, parsers, reachability engine)
* crypto profile (roots, key IDs, algorithm set)
* environment profile (postgres-only vs postgres+valkey)
* seed + canonicalization version
**Test invariant:** re-running the same manifest produces **byte-identical verdict** and **same evidence references**.
### 4.2 Offline Bundle Manifest
Bundle includes:
* feeds + indexes
* policies + lattice rule sets
* trust roots, intermediate CAs, timestamp roots (as needed)
* crypto provider modules (for sovereign readiness)
* optional: Rekor mirror snapshot / inclusion proofs cache
**Test invariant:** offline scan is blocked if bundle is missing required parts; error is explicit and counts as unknown only where policy says so.
### 4.3 Evidence Index
The verdict is not the product; the product is verdict + evidence graph:
* pointers to SBOM, VEX, reachability proofs, attestations
* their digests and verification status
* unknowns list with codes + remediation hints
**Test invariant:** every not affected claim has required evidence hooks per policy (“because feature flag off etc.), otherwise becomes unknown/fail.
---
## 5) Required E2E flows (minimum set)
These are your release blockers.
### Flow A: Air-gapped scan and verdict
* Inputs: image tarball + offline bundle
* Network: disabled
* Output: SBOM (CycloneDX + SPDX), attestations, verdict
* Assertions:
* no network calls occurred
* verdict references bundle digest + feed snapshot digest
* unknowns within budget
* evidence index complete
### Flow B: SBOM interop round-trip
* Produce SBOM via your pipeline
* Attach SBOM attestation (DSSE/cosign format)
* Consumer (Grype-from-SBOM) reads SBOM and produces findings
* Assertions:
* consumer can parse SBOM
* findings parity within defined tolerance
* verdict references exact SBOM digest used by consumer
### Flow C: Deterministic replay
* Run scan store run manifest + outputs
* Run again from same manifest
* Assertions:
* verdict bytes identical
* evidence index identical (except allowed execution metadata section)
* delta verdict is empty delta
### Flow D: Diff-aware delta verdict (smart-diff)
* Two versions of same image with controlled change (one dependency bump)
* Assertions:
* delta verdict contains only changed nodes/edges
* risk budget computation based on delta matches expected
* signed delta verdict validates and is OCI-attached
### Flow E: Unknowns budget gates
* Inject unknowns (unmapped package, missing distro metadata, ambiguous CPE)
* Policy:
* prod budget = 0
* staging budget = N
* Assertions:
* prod fails, staging passes
* unknowns appear in attestation and UI evidence
### Flow F: Router backpressure under burst
* Spike requests to a single router instance + environment bucket
* Assertions:
* 429/503 with Retry-After emitted correctly
* clients backoff; no request loss
* metrics expose throttling reasons
### Flow G: Evidence export (“audit pack”)
* Run scan
* Export a sealed audit pack (bundle + run manifest + evidence + verdict)
* Import elsewhere (clean environment)
* Assertions:
* replay produces identical verdict
* signatures verify under imported trust roots
---
## 6) Module-specific test requirements
### 6.1 `scanner.webservice` (lattice + policy decisioning)
Must have:
* unit tests for lattice merge algebra
* property tests: declared commutativity/associativity/idempotency
* integration tests that merge vendor/distro/internal VEX and confirm precedence rules are policy-driven
**Critical invariant tests:**
* Vendor > distro > internal” must be demonstrably *configurable*, and wrong merges must fail deterministically.
### 6.2 Boundary enforcement: Concelier & Excitors preserve prune source
Add a “behavioral boundary suite”:
* instrument events/telemetry that records where merges happened
* feed in conflicting VEX claims and assert:
* Concelier/Excitors do not resolve conflicts; they retain provenance and “prune source”
* only `scanner.webservice` produces the final merged semantics
If Concelier/Excitors output a resolved claim, the test fails.
### 6.3 `Router` backpressure and DPoP/nonce rate limiting
* deterministic unit tests for token bucket math
* time-controlled tests (virtual clock)
* integration tests with Valkey + Postgres-only fallbacks
* chaos tests: Valkey down → router degrades gracefully (local per-instance limiter still works)
### 6.4 Storage (Postgres) + Valkey accelerator
* migration tests: schema upgrades forward/backward in CI
* replay tests: Postgres-only profile yields same verdict bytes
* consistency tests: Valkey cache misses never change decision outcomes, only latency
### 6.5 UI evidence rendering
* reducer snapshot tests for:
* reachability subgraph rendering (large graphs)
* VEX chip states: affected/not-affected/under-investigation/unknown
* performance budgets:
* large graph render under threshold (define and enforce)
* contract tests against evidence index schema
---
## 7) Non-functional test program
### 7.1 Performance and scale tests
Define standard workloads:
* small image (200 packages)
* medium (2k packages)
* large (20k+ packages)
* “monorepo container” worst case (50k+ nodes graph)
Metrics collected:
* p50/p95/p99 scan time
* memory peak
* DB write volume
* evidence pack size
* router throughput + throttle rate
Add regression gates:
* no more than X% slowdown in p95 vs baseline
* no more than Y% growth in evidence pack size for unchanged inputs
### 7.2 Chaos and reliability
Run chaos suites weekly/nightly:
* kill scanner during run → resume/retry semantics deterministic
* restart Postgres mid-run → job fails with explicit retryable state
* corrupt offline bundle file → fails with typed error, not crash
* burst router + slow downstream → confirms backpressure not meltdown
### 7.3 Security robustness tests
* fuzz parsers: SPDX, CycloneDX, VEX, DSSE envelopes
* zip/tar bomb defenses (artifact ingestion)
* signature bypass attempts:
* mismatched digest
* altered payload with valid signature on different content
* wrong root chain
* SSRF defense: any URL fields in SBOM/VEX are treated as data, never fetched in offline mode
---
## 8) CI/CD gating rules (what blocks a release)
Release candidate is blocked if any of these fail:
1. All mandatory E2E flows (§5) pass in both profiles:
* Postgres-only
* Postgres+Valkey
2. Deterministic replay suite:
* zero non-deterministic diffs in verdict bytes
* allowed diff list is explicit and reviewed
3. Interop suite:
* CycloneDX 1.6 and SPDX 3.0.1 round-trips succeed
* consumer scanner compatibility tests pass
4. Risk budgets + unknowns budgets:
* must pass on corpus, and no regressions against baseline
5. Backpressure correctness:
* Retry-After compliance and throttle metrics validated
6. Performance regression budgets:
* no breach of p95/memory budgets on standard workloads
7. Flakiness threshold:
* if a test flakes more than N times per week, it is quarantined *and* release is blocked until a deterministic root cause is established (quarantine is allowed only for non-blocking suites, never for §5 flows)
---
## 9) Implementation blueprint (how to build this test program)
### Phase 0: Harness and corpus
* Stand up test harness: docker compose + Testcontainers (.NET xUnit)
* Create corpus repo with 1020 curated artifacts
* Implement run manifest + evidence index capture in all tests
### Phase 1: Determinism and replay
* canonicalization utilities + golden verdict bytes
* replay runner that loads manifest and replays end-to-end
* add property-based tests for ordering and merge invariants
### Phase 2: Offline e2e + interop
* offline bundle builder + strict “no egress” enforcement
* SBOM attestation round-trip + consumer parsing suite
### Phase 3: Unknowns budgets + delta verdict
* unknown taxonomy everywhere (UI + attestations)
* delta verdict generation and signing
* diff-aware release gates
### Phase 4: Backpressure + chaos + performance
* router throttle chaos suite
* scale tests with standard workloads and baselines
### Phase 5: Audit packs + time-travel snapshots
* sealed export/import
* one-command replay for auditors
---
## 10) What you should standardize immediately
If you do only three things, do these:
1. **Run Manifest** as first-class test artifact
2. **Golden corpus** that pins all digests (feeds, policies, images, expected outputs)
3. **“No egress” default** in CI with explicit opt-in for online tests
Everything else becomes far easier once these are in place.
---
If you want, I can also produce a concrete repository layout and CI job matrix (xUnit categories, docker compose profiles, artifact retention conventions, and baseline benchmark scripts) that matches .NET 10 conventions and your Postgres/Valkey profiles.

View File

@@ -0,0 +1,469 @@
Below are implementation-grade guidelines for Stella Ops Product Managers (PMs) and Development Managers (Eng Managers / Tech Leads) for two tightly coupled capabilities:
1. **Exception management as auditable objects** (not suppression files)
2. **Audit packs** (exportable, verifiable evidence bundles for releases and environments)
The intent is to make these capabilities:
* operationally useful (reduce friction in CI/CD and runtime governance),
* defensible in audits (tamper-evident, attributable, time-bounded), and
* consistent with Stella Ops positioning around determinism, evidence, and replayability.
---
# 1. Shared objectives and boundaries
## 1.1 Objectives
These two capabilities must jointly enable:
* **Risk decisions are explicit**: Every “ignore/suppress/waive” is a governed decision with an owner and expiry.
* **Decisions are replayable**: If an auditor asks “why did you ship this on date X?”, Stella Ops can reproduce the decision using the same policy + evidence + knowledge snapshot.
* **Decisions are exportable and verifiable**: Audit packs include the minimum necessary artifacts and a manifest that allows independent verification of integrity and completeness.
* **Operational friction is reduced**: Teams can ship safely with controlled exceptions, rather than ad-hoc suppressions, while retaining accountability.
## 1.2 Out of scope (explicitly)
Avoid scope creep early. The following are out of scope for v1 unless mandated by a target customer:
* Full GRC mapping to specific frameworks (you can *support evidence*; dont claim compliance).
* Fully automated approvals based on HR org charts.
* Multi-year archival systems (start with retention, export, and immutable event logs).
* A “ticketing system replacement.” Integrate with ticketing; dont rebuild it.
---
# 2. Shared design principles (non-negotiables)
These principles apply to both Exception Objects and Audit Packs:
1. **Attribution**: every action has an authenticated actor identity (human or service), a timestamp, and a reason.
2. **Immutability of history**: edits are new versions/events; never rewrite history in place.
3. **Least privilege scope**: exceptions must be as narrow as possible (artifact digest over tag; component purl over “any”; environment constraints).
4. **Time-bounded risk**: exceptions must expire. “Permanent ignore” is a governance smell.
5. **Deterministic evaluation**: given the same policy + snapshot + exceptions + inputs, the outcome is stable and reproducible.
6. **Separation of concerns**:
* Exception store = governed decisions.
* Scanner = evidence producer.
* Policy engine = deterministic evaluator.
* Audit packer = exporter/assembler/verifier.
---
# 3. Exception management as auditable objects
## 3.1 What an “Exception Object” is
An Exception Object is a structured, versioned record that modifies evaluation behavior *in a controlled manner*, while leaving the underlying findings intact.
It is not:
* a local `.ignore` file,
* a hidden suppression rule,
* a UI-only toggle,
* a vendor-specific “ignore list” with no audit trail.
### Exception types you should support (minimum set)
PMs should start with these canonical types:
1. **Vulnerability exception**
* suppress/waive a specific vulnerability finding (e.g., CVE/CWE) under defined scope.
2. **Policy exception**
* allow a policy rule to be bypassed under defined scope (e.g., “allow unsigned artifact for dev namespace”).
3. **Unknown-state exception** (if Stella models unknowns)
* allow a release despite unresolved unknowns, with explicit risk acceptance.
4. **Component exception**
* allow/deny a component/package/version across a domain, again with explicit scope and expiry.
## 3.2 Required fields and schema guidelines
PMs: mandate these fields; Eng: enforce them at API and storage level.
### Required fields (v1)
* **exception_id** (stable identifier)
* **version** (monotonic; or event-sourced)
* **status**: proposed | approved | active | expired | revoked
* **owner** (accountable person/team)
* **requester** (who initiated)
* **approver(s)** (who approved; may be empty for dev environments depending on policy)
* **created_at / updated_at / approved_at / expires_at**
* **scope** (see below)
* **reason_code** (taxonomy)
* **rationale** (free text, required)
* **evidence_refs** (optional in v1 but strongly recommended)
* **risk_acceptance** (explicit boolean or structured “risk accepted” block)
* **links** (ticket ID, PR, incident, vendor advisory reference) optional but useful
* **audit_log_refs** (implicit if event-sourced)
### Scope model (critical to defensibility)
Scope must be structured and narrowable. Provide scope dimensions such as:
* **Artifact scope**: image digest, SBOM digest, build provenance digest (preferred)
(Avoid tags as primary scope unless paired with immutability constraints.)
* **Component scope**: purl + version range + ecosystem
* **Vulnerability scope**: CVE ID(s), GHSA, internal ID; optionally path/function/symbol constraints
* **Environment scope**: cluster/namespace, runtime env (dev/stage/prod), repository, project, tenant
* **Time scope**: expires_at (required), optional “valid_from”
PM guideline: default UI and API should encourage digest-based scope and warn on broad scopes.
## 3.3 Reason codes (taxonomy)
Reason codes are a moat because they enable governance analytics and policy automation.
Minimum suggested taxonomy:
* **FALSE_POSITIVE** (with evidence expectations)
* **NOT_REACHABLE** (reachable proof preferred)
* **NOT_AFFECTED** (VEX-backed preferred)
* **BACKPORT_FIXED** (package/distro evidence preferred)
* **COMPENSATING_CONTROL** (link to control evidence)
* **RISK_ACCEPTED** (explicit sign-off)
* **TEMPORARY_WORKAROUND** (link to mitigation plan)
* **VENDOR_PENDING** (under investigation)
* **BUSINESS_EXCEPTION** (rare; requires stronger approval)
PM guideline: reason codes must be selectable and reportable; do not allow “Other” as the default.
## 3.4 Evidence attachments
Exceptions should evolve from “justification-only” to “justification + evidence.”
Evidence references can point to:
* VEX statements (OpenVEX/CycloneDX VEX)
* reachability proof fragments (call-path subgraph, symbol references)
* distro advisories / patch references
* internal change tickets / mitigation PRs
* runtime mitigations
Eng guideline: store evidence as references with integrity checks (hash/digest). For v2+, store evidence bundles as content-addressed blobs.
## 3.5 Lifecycle and workflows
### Lifecycle states and transitions
* **Proposed** → **Approved****Active** → (**Expired** or **Revoked**)
* **Renewal** should create a **new version** (never extend an old record silently).
### Approvals
PM guideline:
* At least two approval modes:
1. **Self-approved** (allowed only for dev/experimental scopes)
2. **Two-person review** (required for prod or broad scope)
Eng guideline:
* Enforce approval rules via policy config (not hard-coded).
* Record every approval action with actor identity and timestamp.
### Expiry enforcement
Non-negotiable:
* Expired exceptions must stop applying automatically.
* Renewals require an explicit action and new audit trail.
## 3.6 Evaluation semantics (how exceptions affect results)
This is where most products become non-auditable. You need deterministic, explicit rules.
PM guideline: define precedence clearly:
* Policy engine evaluates baseline findings → applies exceptions → produces verdict.
* Exceptions never delete underlying findings; they alter the *decision outcome* and annotate the reasoning.
Eng guideline: exception application must be:
* **Deterministic** (stable ordering rules)
* **Transparent** (verdict includes “exception applied: exception_id, reason_code, scope match explanation”)
* **Scoped** (match explanation must state which scope dimensions matched)
## 3.7 Auditability requirements
Exception management must be audit-ready by construction.
Minimum requirements:
* **Append-only event log** for create/approve/revoke/expire/renew actions
* **Versioning**: every change results in a new version or event
* **Tamper-evidence**: hash chain events or sign event batches
* **Retention**: define retention policy and export strategy
PM guideline: auditors will ask “who approved,” “why,” “when,” “what scope,” and “what changed since.” Design the UX and exports to answer those in minutes.
## 3.8 UX guidelines
Key UX flows:
* **Create exception from a finding** (pre-fill CVE/component/artifact scope)
* **Preview impact** (“this will suppress 37 findings across 12 images; are you sure?”)
* **Expiry visibility** (countdown, alerts, renewal prompts)
* **Audit trail view** (who did what, with diffs between versions)
* **Search and filters** by owner, reason, expiry window, scope breadth, environment
UX anti-patterns to forbid:
* “Ignore all vulnerabilities in this image” with one click
* Silent suppressions without owner/expiry
* Exceptions created without linking to scope and reason
## 3.9 Product acceptance criteria (PM-owned)
A feature is not “done” until:
* Every exception has owner, expiry, reason code, scope.
* Exception history is immutable and exportable.
* Policy outcomes show applied exceptions and why.
* Expiry is enforced automatically.
* A user can answer: “What exceptions were active for this release?” within 2 minutes.
---
# 4. Audit packs
## 4.1 What an audit pack is
An Audit Pack is a **portable, verifiable bundle** that answers:
* What was evaluated? (artifacts, versions, identities)
* Under what policies? (policy version/config)
* Using what knowledge state? (vuln DB snapshot, VEX inputs)
* What exceptions were applied? (IDs, owners, rationales)
* What was the decision and why? (verdict + evidence pointers)
* What changed since the last release? (optional diff summary)
PM guideline: treat the Audit Pack as a product deliverable, not an export button.
## 4.2 Pack structure (recommended)
Use a predictable, documented layout. Example:
* `manifest.json`
* pack_id, generated_at, generator_version
* hashes/digests of every included file
* signing info (optional in v1; recommended soon)
* `inputs/`
* artifact identifiers (digests), repo references (optional)
* SBOM(s) (CycloneDX/SPDX)
* `vex/`
* VEX docs used + any VEX produced
* `policy/`
* policy bundle used (versioned)
* evaluation settings
* `exceptions/`
* all exceptions relevant to the evaluated scope
* plus event logs / versions
* `findings/`
* normalized findings list
* reachability evidence fragments if applicable
* `verdict/`
* final decision object
* explanation summary
* signed attestation (if supported)
* `diff/` (optional)
* delta from prior baseline (what changed materially)
## 4.3 Formats: human and machine
You need both:
* **Machine-readable** (JSON + standard SBOM/VEX formats) for verification and automation
* **Human-readable** summary (HTML or PDF) for auditors and leadership
PM guideline: machine artifacts are the source of truth. Human docs are derived views.
Eng guideline:
* Ensure the pack can be generated **offline**.
* Ensure deterministic outputs where feasible (stable ordering, consistent serialization).
## 4.4 Integrity and verification
At minimum:
* `manifest.json` includes a digest for each file.
* Provide a `stella verify-pack` CLI that checks:
* manifest integrity
* file hashes
* schema versions
* optional signature verification
For v2:
* Sign the manifest (and/or the verdict) using your standard attestation mechanism.
## 4.5 Confidentiality and redaction
Audit packs often include sensitive data (paths, internal package names, repo URLs).
PM guideline:
* Provide **redaction profiles**:
* external auditor pack (minimal identifiers)
* internal audit pack (full detail)
* Provide encryption options (password/recipient keys) if packs leave the environment.
Eng guideline:
* Redaction must be deterministic and declarative (policy-based).
* Pack generation must not leak secrets from raw scan logs.
## 4.6 Pack generation workflow
Key product flows:
* Generate pack for:
* a specific artifact digest
* a release (set of digests)
* an environment snapshot (e.g., cluster inventory)
* a date range (for audit period)
* Trigger sources:
* UI
* API
* CI pipeline step
Engineering:
* Treat pack generation as an async job (queue + status endpoint).
* Cache pack components when inputs are identical (avoid repeated work).
## 4.7 What must be included (minimum viable audit pack)
PMs should enforce that v1 includes:
* Artifact identity
* SBOM(s) or component inventory
* Findings list (normalized)
* Policy bundle reference + policy content
* Exceptions applied (full object + version info)
* Final verdict + explanation summary
* Integrity manifest with file hashes
Add these when available (v1.5+):
* VEX inputs and outputs
* Knowledge snapshot references
* Reachability evidence fragments
* Diff summary vs prior release
## 4.8 Product acceptance criteria (PM-owned)
Audit Packs are not “done” until:
* A third party can validate the pack contents havent been altered (hash verification).
* The pack answers “why did this pass/fail?” including exceptions applied.
* Packs can be generated without external network calls (air-gap friendly).
* Packs support redaction profiles.
* Pack schema is versioned and backward compatible.
---
# 5. Cross-cutting: roles, responsibilities, and delivery checkpoints
## 5.1 Responsibilities
**Product Manager**
* Define exception types and required fields
* Define reason code taxonomy and governance policies
* Define approval rules by environment and scope breadth
* Define audit pack templates, profiles, and export targets
* Own acceptance criteria and audit usability testing
**Development Manager / Tech Lead**
* Own event model (immutability, versioning, retention)
* Own policy evaluation semantics and determinism guarantees
* Own integrity and signing design (manifest hashes, optional signatures)
* Own performance and scalability targets (pack generation and query latency)
* Own secure storage and access controls (RBAC, tenant isolation)
## 5.2 Deliverables checklist (for each capability)
For “Exception Objects”:
* PRD + threat model (abuse cases: blanket waivers, privilege escalation)
* Schema spec + versioning policy
* API endpoints + RBAC model
* UI flows + audit trail UI
* Policy engine semantics + test vectors
* Metrics dashboards
For “Audit Packs”:
* Pack schema spec + folder layout
* Manifest + hash verification rules
* Generator service + async job API
* Redaction profiles + tests
* Verifier CLI + documentation
* Performance benchmarks + caching strategy
---
# 6. Common failure modes to actively prevent
1. **Exceptions become suppressions again**
If you allow exceptions without expiry/owner or without audit trail, youve rebuilt “ignore lists.”
2. **Over-broad scopes by default**
If “all repos/all images” is easy, you will accumulate permanent waivers and lose credibility.
3. **No deterministic semantics**
If the same artifact can pass/fail depending on evaluation order or transient feed updates, auditors will distrust outputs.
4. **Audit packs that are reports, not evidence**
A PDF without machine-verifiable artifacts is not an audit pack—its a slide.
5. **No renewal discipline**
If renewals are frictionless and dont require re-justification, exceptions never die.
---
# 7. Recommended phased rollout (to manage build cost)
**Phase 1: Governance basics**
* Exception object schema + lifecycle + expiry enforcement
* Create-from-finding UX
* Audit pack v1 (SBOM/inventory + findings + policy + exceptions + manifest)
**Phase 2: Evidence binding**
* Evidence refs on exceptions (VEX, reachability fragments)
* Pack includes VEX inputs/outputs and knowledge snapshot identifiers
**Phase 3: Verifiable trust**
* Signed verdicts and/or signed pack manifests
* Verifier tooling and deterministic replay hooks
---
If you want, I can convert the above into two artifacts your teams can execute against immediately:
1. A concise **PRD template** (sections + required decisions) for Exceptions and Audit Packs
2. A **technical spec outline** (schema definitions, endpoints, state machines, and acceptance test vectors)

View File

@@ -0,0 +1,556 @@
## Guidelines for Product and Development Managers: Signed, Replayable Risk Verdicts
### Purpose
Signed, replayable risk verdicts are the Stella Ops mechanism for producing a **cryptographically verifiable, auditready decision** about an artifact (container image, VM image, filesystem snapshot, SBOM, etc.) that can be **recomputed later to the same result** using the same inputs (“time-travel replay”).
This capability is not “scan output with a signature.” It is a **decision artifact** that becomes the unit of governance in CI/CD, registry admission, and audits.
---
# 1) Shared definitions and non-negotiables
## 1.1 Definitions
**Risk verdict**
A structured decision: *Pass / Fail / Warn / NeedsReview* (or similar), produced by a deterministic evaluator under a specific policy and knowledge state.
**Signed**
The verdict is wrapped in a tamperevident envelope (e.g., DSSE/intoto statement) and signed using an organization-approved trust model (key-based, keyless, or offline CA).
**Replayable**
Given the same:
* target artifact identity
* SBOM (or derivation method)
* vulnerability and advisory knowledge state
* VEX inputs
* policy bundle
* evaluator version
…Stella Ops can **re-evaluate and reproduce the same verdict** and provide evidence equivalence.
> Critical nuance: replayability is about *result equivalence*. Byteforbyte equality is ideal but not always required if signatures/metadata necessarily vary. If byteforbyte is a goal, you must strictly control timestamps, ordering, and serialization.
---
## 1.2 Non-negotiables (what must be true in v1)
1. **Verdicts are bound to immutable artifact identity**
* Container image: digest (sha256:…)
* SBOM: content digest
* File tree: merkle root digest, or equivalent
2. **Verdicts are deterministic**
* No “current time” dependence in scoring
* No non-deterministic ordering of findings
* No implicit network calls during evaluation
3. **Verdicts are explainable**
* Every deny/block decision must cite the policy clause and evidence pointers that triggered it.
4. **Verdicts are verifiable**
* Independent verification toolchain exists (CLI/library) that validates signature and checks referenced evidence integrity.
5. **Knowledge state is pinned**
* The verdict references a “knowledge snapshot” (vuln feeds, advisories, VEX set) by digest/ID, not “latest.”
---
## 1.3 Explicit non-goals (avoid scope traps)
* Building a full CNAPP runtime protection product as part of verdicting.
* Implementing “all possible attestation standards.” Pick one canonical representation; support others via adapters.
* Solving global revocation and key lifecycle for every ecosystem on day one; define a minimum viable trust model per deployment mode.
---
# 2) Product Management Guidelines
## 2.1 Position the verdict as the primary product artifact
**PM rule:** if a workflow does not end in a verdict artifact, it is not part of this moat.
Examples:
* CI pipeline step produces `VERDICT.attestation` attached to the OCI artifact.
* Registry admission checks for a valid verdict attestation meeting policy.
* Audit export bundles the verdict plus referenced evidence.
**Avoid:** “scan reports” as the goal. Reports are views; the verdict is the object.
---
## 2.2 Define the core personas and success outcomes
Minimum personas:
1. **Release/Platform Engineering**
* Needs automated gates, reproducibility, and low friction.
2. **Security Engineering / AppSec**
* Needs evidence, explainability, and exception workflows.
3. **Audit / Compliance**
* Needs replay, provenance, and a defensible trail.
Define “first value” for each:
* Release engineer: gate merges/releases without re-running scans.
* Security engineer: investigate a deny decision with evidence pointers in minutes.
* Auditor: replay a verdict months later using the same knowledge snapshot.
---
## 2.3 Product requirements (expressed as “shall” statements)
### 2.3.1 Verdict content requirements
A verdict SHALL contain:
* **Subject**: immutable artifact reference (digest, type, locator)
* **Decision**: pass/fail/warn/etc.
* **Policy binding**: policy bundle ID + version + digest
* **Knowledge snapshot binding**: snapshot IDs/digests for vuln feed and VEX set
* **Evaluator binding**: evaluator name/version + schema version
* **Rationale summary**: stable short explanation (human-readable)
* **Findings references**: pointers to detailed findings/evidence (content-addressed)
* **Unknowns state**: explicit unknown counts and categories
### 2.3.2 Replay requirements
The product SHALL support:
* Re-evaluating the same subject under the same policy+knowledge snapshot
* Proving equivalence of inputs used in the original verdict
* Producing a “replay report” that states:
* replay succeeded and matched
* or replay failed and why (e.g., missing evidence, policy changed)
### 2.3.3 UX requirements
UI/UX SHALL:
* Show verdict status clearly (Pass/Fail/…)
* Display:
* policy clause(s) responsible
* top evidence pointers
* knowledge snapshot ID
* signature trust status (who signed, chain validity)
* Provide “Replay” as an action (even if replay happens offline, the UX must guide it)
---
## 2.4 Product taxonomy: separate “verdicts” from “evaluations” from “attestations”
This is where many products get confused. Your terminology must remain strict:
* **Evaluation**: internal computation that produces decision + findings.
* **Verdict**: the stable, canonical decision payload (the thing being signed).
* **Attestation**: the signed envelope binding the verdict to cryptographic identity.
PMs must enforce this vocabulary in PRDs, UI labels, and docs.
---
## 2.5 Policy model guidelines for verdicting
Verdicting depends on policy discipline.
PM rules:
* Policy must be **versioned** and **content-addressed**.
* Policies must be **pure functions** of declared inputs:
* SBOM graph
* VEX claims
* vulnerability data
* reachability evidence (if present)
* environment assertions (if present)
* Policies must produce:
* a decision
* plus a minimal explanation graph (policy rule ID → evidence IDs)
Avoid “freeform scripts” early. You need determinism and auditability.
---
## 2.6 Exceptions are part of the verdict product, not an afterthought
PM requirement:
* Exceptions must be first-class objects with:
* scope (exact artifact/component range)
* owner
* justification
* expiry
* required evidence (optional but strongly recommended)
And verdict logic must:
* record that an exception was applied
* include exception IDs in the verdict evidence graph
* make exception usage visible in UI and audit pack exports
---
## 2.7 Success metrics (PM-owned)
Choose metrics that reflect the moat:
* **Replay success rate**: % of verdicts that can be replayed after N days.
* **Policy determinism incidents**: number of non-deterministic evaluation bugs.
* **Audit cycle time**: time to satisfy an audit evidence request for a release.
* **Noise**: # of manual suppressions/overrides per 100 releases (should drop).
* **Gate adoption**: % of releases gated by verdict attestations (not reports).
---
# 3) Development Management Guidelines
## 3.1 Architecture principles (engineering tenets)
### Tenet A: Determinism-first evaluation
Engineering SHALL ensure evaluation is deterministic across:
* OS and architecture differences (as much as feasible)
* concurrency scheduling
* non-ordered data structures
Practical rules:
* Never iterate over maps/hashes without sorting keys.
* Canonicalize output ordering (findings sorted by stable tuple: (component_id, cve_id, path, rule_id)).
* Keep “generated at” timestamps out of the signed payload; if needed, place them in an unsigned wrapper or separate metadata field excluded from signature.
### Tenet B: Content-address everything
All significant inputs/outputs should have content digests:
* SBOM digest
* policy digest
* knowledge snapshot digest
* evidence bundle digest
* verdict digest
This makes replay and integrity checks possible.
### Tenet C: No hidden network
During evaluation, the engine must not fetch “latest” anything.
Network is allowed only in:
* snapshot acquisition phase
* artifact retrieval phase
* attestation publication phase
…and each must be explicitly logged and pinned.
---
## 3.2 Canonical verdict schema and serialization rules
**Engineering guideline:** pick a canonical serialization and stick to it.
Options:
* Canonical JSON (JCS or equivalent)
* CBOR with deterministic encoding
Rules:
* Define a **schema version** and strict validation.
* Make field names stable; avoid “optional” fields that appear/disappear nondeterministically.
* Ensure numeric formatting is stable (no float drift; prefer integers or rational representation).
* Always include empty arrays if required for stability, or exclude consistently by schema rule.
---
## 3.3 Suggested verdict payload (illustrative)
This is not a mandate—use it as a baseline structure.
```json
{
"schema_version": "1.0",
"subject": {
"type": "oci-image",
"name": "registry.example.com/app/service",
"digest": "sha256:…",
"platform": "linux/amd64"
},
"evaluation": {
"evaluator": "stella-eval",
"evaluator_version": "0.9.0",
"policy": {
"id": "prod-default",
"version": "2025.12.1",
"digest": "sha256:…"
},
"knowledge_snapshot": {
"vuln_db_digest": "sha256:…",
"advisory_digest": "sha256:…",
"vex_set_digest": "sha256:…"
}
},
"decision": {
"status": "fail",
"score": 87,
"reasons": [
{ "rule_id": "RISK.CRITICAL.REACHABLE", "evidence_ref": "sha256:…" }
],
"unknowns": {
"unknown_reachable": 2,
"unknown_unreachable": 0
}
},
"evidence": {
"sbom_digest": "sha256:…",
"finding_bundle_digest": "sha256:…",
"inputs_manifest_digest": "sha256:…"
}
}
```
Then wrap this payload in your chosen attestation envelope and sign it.
---
## 3.4 Attestation format and storage guidelines
Development managers must enforce a consistent publishing model:
1. **Envelope**
* Prefer DSSE/in-toto style envelope because it:
* standardizes signing
* supports multiple signature schemes
* is widely adopted in supply chain ecosystems
2. **Attachment**
* OCI artifacts should carry verdicts as referrers/attachments to the subject digest (preferred).
* For non-OCI targets, store in an internal ledger keyed by the subject digest/ID.
3. **Verification**
* Provide:
* `stella verify <artifact>` → checks signature and integrity references
* `stella replay <verdict>` → re-run evaluation from snapshots and compare
4. **Transparency / logs**
* Optional in v1, but plan for:
* transparency log (public or private) to strengthen auditability
* offline alternatives for air-gapped customers
---
## 3.5 Knowledge snapshot engineering requirements
A “snapshot” must be an immutable bundle, ideally content-addressed:
Snapshot includes:
* vulnerability database at a specific point
* advisory sources (OS distro advisories)
* VEX statement set(s)
* any enrichment signals that influence scoring
Rules:
* Snapshot resolution must be explicit: “use snapshot digest X”
* Must support export/import for air-gapped deployments
* Must record source provenance and ingestion timestamps (timestamps may be excluded from signed payload if they cause nondeterminism; store them in snapshot metadata)
---
## 3.6 Replay engine requirements
Replay is not “re-run scan and hope it matches.”
Replay must:
* retrieve the exact subject (or confirm it via digest)
* retrieve the exact SBOM (or deterministically re-generate it from the subject in a defined way)
* load exact policy bundle by digest
* load exact knowledge snapshot by digest
* run evaluator version pinned in verdict (or enforce a compatibility mapping)
* produce:
* verdict-equivalence result
* a delta explanation if mismatch occurs
Engineering rule: replay must fail loudly and specifically when inputs are missing.
---
## 3.7 Testing strategy (required)
Deterministic systems require “golden” testing.
Minimum tests:
1. **Golden verdict tests**
* Fixed artifact + fixed snapshots + fixed policy
* Expected verdict output must match exactly
2. **Cross-platform determinism tests**
* Run same evaluation on different machines/containers and compare outputs
3. **Mutation tests for determinism**
* Randomize ordering of internal collections; output should remain unchanged
4. **Replay regression tests**
* Store verdict + snapshots and replay after code changes to ensure compatibility guarantees hold
---
## 3.8 Versioning and backward compatibility guidelines
This is essential to prevent “replay breaks after upgrades.”
Rules:
* **Verdict schema version** changes must be rare and carefully managed.
* Maintain a compatibility matrix:
* evaluator vX can replay verdict schema vY
* If you must evolve logic, do so by:
* bumping evaluator version
* preserving older evaluators in a compatibility mode (containerized evaluators are often easiest)
---
## 3.9 Security and key management guidelines
Development managers must ensure:
* Signing keys are managed via:
* KMS/HSM (enterprise)
* keyless (OIDC-based) where acceptable
* offline keys for air-gapped
* Verification trust policy is explicit:
* which identities are trusted to sign verdicts
* which policies are accepted
* whether transparency is required
* how to handle revocation/rotation
* Separate “can sign” from “can publish”
* Signing should be restricted; publishing may be broader.
---
# 4) Operational workflow requirements (cross-functional)
## 4.1 CI gate flow
* Build artifact
* Produce SBOM deterministically (or record SBOM digest if generated elsewhere)
* Evaluate → produce verdict payload
* Sign verdict → publish attestation attached to artifact
* Gate decision uses verification of:
* signature validity
* policy compliance
* snapshot integrity
## 4.2 Registry / admission flow
* Admission controller checks for a valid, trusted verdict attestation
* Optionally requires:
* verdict not older than X snapshot age (this is policy)
* no expired exceptions
* replay not required (replay is for audits; admission is fast-path)
## 4.3 Audit flow
* Export “audit pack”:
* verdict + signature chain
* policy bundle
* knowledge snapshot
* referenced evidence bundles
* Auditor (or internal team) runs `verify` and optionally `replay`
---
# 5) Common failure modes to avoid
1. **Signing “findings” instead of a decision**
* Leads to unbounded payload growth and weak governance semantics.
2. **Using “latest” feeds during evaluation**
* Breaks replayability immediately.
3. **Embedding timestamps in signed payload**
* Eliminates deterministic byte-level reproducibility.
4. **Letting the UI become the source of truth**
* The verdict artifact must be the authority; UI is a view.
5. **No clear separation between: evidence store, snapshot store, verdict store**
* Creates coupling and makes offline operations painful.
---
# 6) Definition of Done checklist (use this to gate release)
A feature increment for signed, replayable verdicts is “done” only if:
* [ ] Verdict binds to immutable subject digest
* [ ] Verdict includes policy digest/version and knowledge snapshot digests
* [ ] Verdict is signed and verifiable via CLI
* [ ] Verification works offline (given exported artifacts)
* [ ] Replay works with stored snapshots and produces match/mismatch output with reasons
* [ ] Determinism tests pass (golden + mutation + cross-platform)
* [ ] UI displays signer identity, policy, snapshot IDs, and rule→evidence links
* [ ] Exceptions (if implemented) are recorded in verdict and enforced deterministically
---
## Optional: Recommended implementation sequence (keeps risk down)
1. Canonical verdict schema + deterministic evaluator skeleton
2. Signing + verification CLI
3. Snapshot bundle format + pinned evaluation
4. Replay tool + golden tests
5. OCI attachment publishing + registry/admission integration
6. Evidence bundles + UI explainability
7. Exceptions + audit pack export
---
If you want this turned into a formal internal PRD template, I can format it as:
* “Product requirements” (MUST/SHOULD/COULD)
* “Engineering requirements” (interfaces + invariants + test plan)
* “Security model” (trust roots, signing identities, verification policy)
* “Acceptance criteria” for an MVP and for GA

View File

@@ -0,0 +1,462 @@
Below are internal guidelines for Stella Ops Product Managers and Development Managers for the capability: **Knowledge Snapshots / TimeTravel Replay**. This is written as an implementable operating standard (not a concept note).
---
# Knowledge Snapshots / TimeTravel Replay
## Product and Engineering Guidelines for Stella Ops
## 1) Purpose and value proposition
### What this capability must achieve
Enable Stella Ops to **reproduce any historical risk decision** (scan result, policy evaluation, verdict) **deterministically**, using a **cryptographically bound snapshot** of the exact knowledge inputs that were available at the time the decision was made.
### Why customers pay for it
This capability is primarily purchased for:
* **Auditability**: “Show me what you knew, when you knew it, and why the system decided pass/fail.”
* **Incident response**: reproduce prior posture using historical feeds/VEX/policies and explain deltas.
* **Airgapped / regulated environments**: deterministic, offline decisioning with attested knowledge state.
* **Change control**: prove whether a decision changed due to code change vs knowledge change.
### Core product promise
For a given artifact and snapshot:
* **Same inputs → same outputs** (verdict, scores, findings, evidence pointers), or Stella Ops must clearly declare the precise exceptions.
---
## 2) Definitions (PMs and engineers must align on these)
### Knowledge input
Any external or semi-external information that can influence the outcome:
* vulnerability databases and advisories (any source)
* exploit-intel signals
* VEX statements (OpenVEX, CSAF, CycloneDX VEX, etc.)
* SBOM ingestion logic and parsing rules
* package identification rules (including distro/backport logic)
* policy content and policy engine version
* scoring rules (including weights and thresholds)
* trust anchors and signature verification policy
* plugin versions and enabled capabilities
* configuration defaults and overrides that change analysis
### Knowledge Snapshot
A **sealed record** of:
1. **References** (which inputs were used), and
2. **Content** (the exact bytes used), and
3. **Execution contract** (the evaluator and ruleset versions)
### TimeTravel Replay
Re-running evaluation of an artifact **using only** the snapshot content and the recorded execution contract, producing the same decision and explainability artifacts.
---
## 3) Product principles (nonnegotiables)
1. **Determinism is a product requirement**, not an engineering detail.
2. **Snapshots are firstclass artifacts** with explicit lifecycle (create, verify, export/import, retain, expire).
3. **The snapshot is cryptographically bound** to outcomes and evidence (tamper-evident chain).
4. **Replays must be possible offline** (when the snapshot includes content) and must fail clearly when not possible.
5. **Minimal surprise**: the UI must explain when a verdict changed due to “knowledge drift” vs “artifact drift.”
6. **Scalability by content addressing**: the platform must deduplicate knowledge content aggressively.
7. **Backward compatibility**: old snapshots must remain replayable within a documented support window.
---
## 4) Scope boundaries (what this is not)
### Non-goals (explicitly out of scope for v1 unless approved)
* Reconstructing *external internet state* beyond what is recorded (no “fetch historical CVE state from the web”).
* Guaranteeing replay across major engine rewrites without a compatibility plan.
* Storing sensitive proprietary customer code in snapshots (unless explicitly enabled).
* Replaying “live runtime signals” unless those signals were captured into the snapshot at decision time.
---
## 5) Personas and use cases (PM guidance)
### Primary personas
* **Security Governance / GRC**: needs audit packs, controls evidence, deterministic history.
* **Incident response / AppSec lead**: needs “what changed and why” quickly.
* **Platform engineering / DevOps**: needs reproducible CI gates and airgap workflows.
* **Procurement / regulated customers**: needs proof of process and defensible attestations.
### Must-support use cases
1. **Replay a past release gate decision** in a new environment (including offline) and get identical outcome.
2. **Explain drift**: “This build fails today but passed last month—why?”
3. **Airgap export/import**: create snapshots in connected environment, import to disconnected one.
4. **Audit bundle generation**: export snapshot + verdict(s) + evidence pointers.
---
## 6) Functional requirements (PM “must/should” list)
### Must
* **Snapshot creation** for every material evaluation (or for every “decision object” chosen by configuration).
* **Snapshot manifest** containing:
* unique snapshot ID (content-addressed)
* list of knowledge sources with hashes/digests
* policy IDs and exact policy content hashes
* engine version and plugin versions
* timestamp and clock source metadata
* trust anchor set hash and verification policy hash
* **Snapshot sealing**:
* snapshot manifest is signed
* signed link from verdict → snapshot ID
* **Replay**:
* re-evaluate using only snapshot inputs
* output must match prior results (or emit a deterministic mismatch report)
* **Export/import**:
* portable bundle format
* import verifies integrity and signatures before allowing use
* **Retention controls**:
* configurable retention windows and storage quotas
* deduplication and garbage collection
### Should
* **Partial snapshots** (reference-only) vs **full snapshots** (content included), with explicit replay guarantees.
* **Diff views**: compare two snapshots and highlight what knowledge changed.
* **Multi-snapshot replay**: run “as-of snapshot A” and “as-of snapshot B” to show drift impact.
### Could
* Snapshot “federation” for large orgs (mirrors/replication with policy controls).
* Snapshot “pinning” to releases or environments as a governance policy.
---
## 7) UX and workflow guidelines (PM + Eng)
### UI must communicate three states clearly
1. **Reproducible offline**: snapshot includes all required content.
2. **Reproducible with access**: snapshot references external sources that must be available.
3. **Not reproducible**: missing content or unsupported evaluator version.
### Required UI objects
* **Snapshot Details page**
* snapshot ID and signature status
* list of knowledge sources (name, version/epoch, digest, size)
* policy bundle version, scoring rules version
* trust anchors + verification policy digest
* replay status: “verified reproducible / reproducible / not reproducible”
* **Verdict page**
* links to snapshot(s)
* “replay now” action
* “compare to latest knowledge” action
### UX guardrails
* Never show “pass/fail” without also showing:
* snapshot ID
* policy ID/version
* verification status
* When results differ on replay, show:
* exact mismatch class (engine mismatch, missing data, nondeterminism, corrupted snapshot)
* what input changed (if known)
* remediation steps
---
## 8) Data model and format guidelines (Development Managers)
### Canonical objects (recommended minimum set)
* **KnowledgeSnapshotManifest (KSM)**
* **KnowledgeBlob** (content-addressed bytes)
* **KnowledgeSourceDescriptor**
* **PolicyBundle**
* **TrustBundle**
* **Verdict** (signed decision artifact)
* **ReplayReport** (records replay result and mismatches)
### Content addressing
* Use a stable hash (e.g., SHA256) for:
* each knowledge blob
* manifest
* policy bundle
* trust bundle
* Snapshot ID should be derived from manifest digest.
### Example manifest shape (illustrative)
```json
{
"snapshot_id": "ksm:sha256:…",
"created_at": "2025-12-19T10:15:30Z",
"engine": { "name": "stella-evaluator", "version": "1.7.0", "build": "…"},
"plugins": [
{ "name": "pkg-id", "version": "2.3.1", "digest": "sha256:…" }
],
"policy": { "bundle_id": "pol:sha256:…", "digest": "sha256:…" },
"scoring": { "ruleset_id": "score:sha256:…", "digest": "sha256:…" },
"trust": { "bundle_id": "trust:sha256:…", "digest": "sha256:…" },
"sources": [
{
"name": "nvd",
"epoch": "2025-12-18",
"kind": "vuln_feed",
"content_digest": "sha256:…",
"licenses": ["…"],
"origin": { "uri": "…", "retrieved_at": "…" }
},
{
"name": "customer-vex",
"kind": "vex",
"content_digest": "sha256:…"
}
],
"environment": {
"determinism_profile": "strict",
"timezone": "UTC",
"normalization": { "line_endings": "LF", "sort_order": "canonical" }
}
}
```
### Versioning rules
* Every object is immutable once written.
* Changes create new digests; never mutate in place.
* Support schema evolution via:
* `schema_version`
* strict validation + migration tooling
* Keep manifests small; store large data as blobs.
---
## 9) Determinism contract (Engineering must enforce)
### Determinism requirements
* Stable ordering: sort inputs and outputs canonically.
* Stable timestamps: timestamps may exist but must not change computed scores/verdict.
* Stable randomization: no RNG; if unavoidable, fixed seed recorded in snapshot.
* Stable parsers: parser versions are pinned by digest; parsing must be deterministic.
### Allowed nondeterminism (if any) must be explicit
If you must allow nondeterminism, it must be:
* documented,
* surfaced in UI,
* included in replay report as “non-deterministic factor,”
* and excluded from the signed decision if it affects pass/fail.
---
## 10) Security model (Development Managers)
### Threats this feature must address
* Feed poisoning (tampered vulnerability data)
* Time-of-check/time-of-use drift (same artifact evaluated against moving feeds)
* Replay manipulation (swap snapshot content)
* “Policy drift hiding” (claiming old decision used different policies)
* Signature bypass (trust anchors altered)
### Controls required
* Sign manifests and verdicts.
* Bind verdict → snapshot ID → policy bundle hash → trust bundle hash.
* Verify on every import and on every replay invocation.
* Audit log:
* snapshot created
* snapshot imported
* replay executed
* verification failures
### Key handling
* Decide and document:
* who signs snapshots/verdicts (service keys vs tenant keys)
* rotation policy
* revocation/compromise handling
* Avoid designing cryptography from scratch; use well-established signing formats and separation of duties.
---
## 11) Offline / airgapped requirements
### Snapshot levels (PM packaging guideline)
Offer explicit snapshot types with clear guarantees:
* **Level A: Reference-only snapshot**
* stores hashes + source descriptors
* replay requires access to original sources
* **Level B: Portable snapshot**
* includes blobs necessary for replay
* replay works offline
* **Level C: Sealed portable snapshot**
* portable + signed + includes trust anchors
* replay works offline and can be verified independently
Do not market airgap support without specifying which level is provided.
---
## 12) Performance and storage guidelines
### Principles
* Content-address knowledge blobs to maximize deduplication.
* Separate “hot” knowledge (recent epochs) from cold storage.
* Support snapshot compaction and garbage collection.
### Operational requirements
* Retention policies per tenant/project/environment.
* Quotas and alerting when snapshot storage approaches limits.
* Export bundles should be chunked/streamable for large feeds.
---
## 13) Testing and acceptance criteria
### Required test categories
1. **Golden replay tests**
* same artifact + same snapshot → identical outputs
2. **Corruption tests**
* bit flips in blobs/manifests are detected and rejected
3. **Version skew tests**
* old snapshot + new engine should either replay deterministically or fail with a clear incompatibility report
4. **Airgap tests**
* export → import → replay without network access
5. **Diff accuracy tests**
* compare snapshots and ensure the diff identifies actual knowledge changes, not noise
### Definition of Done (DoD) for the feature
* Snapshots are created automatically according to policy.
* Snapshots can be exported and imported with verified integrity.
* Replay produces matching verdicts for a representative corpus.
* UI exposes snapshot provenance and replay status.
* Audit log records snapshot lifecycle events.
* Clear failure modes exist (missing blobs, incompatible engine, signature failure).
---
## 14) Metrics (PM ownership)
Track metrics that prove this is a moat, not a checkbox.
### Core KPIs
* **Replay success rate** (strict determinism)
* **Time to explain drift** (median time from “why changed” to root cause)
* **% verdicts with sealed portable snapshots**
* **Audit effort reduction** (customer-reported or measured via workflow steps)
* **Storage efficiency** (dedup ratio; bytes per snapshot over time)
### Guardrail metrics
* Snapshot creation latency impact on CI
* Snapshot storage growth per tenant
* Verification failure rates
---
## 15) Common failure modes (what to prevent)
1. Treating snapshots as “metadata only” and still claiming replayability.
2. Allowing “latest feed fetch” during replay (breaks the promise).
3. Not pinning parser/policy/scoring versions—causes silent drift.
4. Missing clear UX around replay limitations and failure reasons.
5. Overcapturing sensitive inputs (privacy and customer trust risk).
6. Underinvesting in dedup/retention (cost blowups).
---
## 16) Management checklists
### PM checklist (before commitment)
* Precisely define “replay” guarantee level (A/B/C) for each SKU/environment.
* Define which inputs are in scope (feeds, VEX, policies, trust bundles, plugins).
* Define customer-facing workflows:
* “replay now”
* “compare to latest”
* “export for audit / air-gap”
* Confirm governance outcomes:
* audit pack integration
* exception linkage
* release gate linkage
### Development Manager checklist (before build)
* Establish canonical schemas and versioning plan.
* Establish content-addressed storage + dedup plan.
* Establish signing and trust anchor strategy.
* Establish deterministic evaluation contract and test harness.
* Establish import/export packaging and verification.
* Establish retention, quotas, and GC.
---
## 17) Minimal phased delivery (recommended)
**Phase 1: Reference snapshot + verdict binding**
* Record source descriptors + hashes, policy/scoring/trust digests.
* Bind snapshot ID into verdict artifacts.
**Phase 2: Portable snapshots**
* Store knowledge blobs locally with dedup.
* Export/import with integrity verification.
**Phase 3: Sealed portable snapshots + replay tooling**
* Sign snapshots.
* Deterministic replay pipeline + replay report.
* UI surfacing and audit logs.
**Phase 4: Snapshot diff + drift explainability**
* Compare snapshots.
* Attribute decision drift to knowledge changes vs artifact changes.
---
If you want this turned into an internal PRD template, I can rewrite it into a structured PRD format with: objectives, user stories, functional requirements, non-functional requirements, security/compliance, dependencies, risks, and acceptance tests—ready for Jira/Linear epics and engineering design review.

View File

@@ -0,0 +1,497 @@
## Stella Ops Guidelines
### Risk Budgets and Diff-Aware Release Gates
**Audience:** Product Managers (PMs) and Development Managers (DMs)
**Applies to:** All customer-impacting software and configuration changes shipped by Stella Ops (code, infrastructure-as-code, runtime config, feature flags, data migrations, dependency upgrades).
---
## 1) What we are optimizing for
Stella Ops ships quickly **without** letting change-driven incidents, security regressions, or data integrity failures become the hidden cost of “speed.”
These guidelines enforce two linked controls:
1. **Risk Budgets** — a quantitative “capacity to take risk” that prevents reliability and trust from being silently depleted.
2. **Diff-Aware Release Gates** — release checks whose strictness scales with *what changed* (the diff), not with generic process.
Together they let us move fast on low-risk diffs and slow down only when the change warrants it.
---
## 2) Non-negotiable principles
1. **All changes are risk-bearing** (even “small” diffs). We quantify and route them accordingly.
2. **Risk is managed at the product/service boundary** (each service has its own budget and gating profile).
3. **Automation first, approvals last**. Humans review what automation cannot reliably verify.
4. **Blast radius is a first-class variable**. A safe rollout beats a perfect code review.
5. **Exceptions are allowed but never free**. Every bypass is logged, justified, and paid back via budget reduction and follow-up controls.
---
## 3) Definitions
### 3.1 Risk Budget (what it is)
A **Risk Budget** is the amount of change-risk a product/service is allowed to take over a defined window (typically a sprint or month) **without increasing the probability of customer harm beyond the agreed tolerance**.
It is a management control, not a theoretical score.
### 3.2 Risk Budget vs. Error Budget (important distinction)
* **Error Budget** (classic SRE): backward-looking tolerance for *actual* unreliability vs. SLO.
* **Risk Budget** (this policy): forward-looking tolerance for *change risk* before shipping.
They interact:
* If error budget is burned (service is unstable), risk budget is automatically constrained.
* If risk budget is low, release gates tighten by policy.
### 3.3 Diff-aware release gates (what it is)
A **release gate** is a set of required checks (tests, scans, reviews, rollout controls) that must pass before a change can progress.
**Diff-aware** means the gate level is determined by:
* what changed (diff classification),
* where it changed (criticality),
* how it ships (blast radius controls),
* and current operational context (incidents, SLO health, budget remaining).
---
## 4) Roles and accountability
### Product Manager (PM) — accountable for risk appetite
PM responsibilities:
* Define product-level risk tolerance with stakeholders (customer impact tolerance, regulatory constraints).
* Approve the **Risk Budget Policy settings** for their product/service tier (criticality level, default gates).
* Prioritize reliability work when budgets are constrained.
* Own customer communications for degraded service or risk-driven release deferrals.
### Development Manager (DM) — accountable for enforcement and engineering hygiene
DM responsibilities:
* Ensure pipelines implement diff classification and enforce gates.
* Ensure tests, telemetry, rollout mechanisms, and rollback procedures exist and are maintained.
* Ensure “exceptions” process is real (logged, postmortemed, paid back).
* Own staffing/rotation decisions to ensure safe releases (on-call readiness, release captains).
### Shared responsibilities
PM + DM jointly:
* Review risk budget status weekly.
* Resolve trade-offs: feature velocity vs. reliability/security work.
* Approve gate profile changes (tighten/loosen) based on evidence.
---
## 5) Risk Budgets
### 5.1 Establish service tiers (criticality)
Each service/product component must be assigned a **Criticality Tier**:
* **Tier 0 Internal only** (no external customers; low business impact)
* **Tier 1 Customer-facing non-critical** (degradation tolerated; limited blast radius)
* **Tier 2 Customer-facing critical** (core workflows; meaningful revenue/trust impact)
* **Tier 3 Safety/financial/data-critical** (payments, auth, permissions, PII, regulated workflows)
Tier drives default budgets and minimum gates.
### 5.2 Choose a budget window and units
**Window:** default to **monthly** with weekly tracking; optionally sprint-based if release cadence is sprint-coupled.
**Units:** use **Risk Points (RP)** — consumed by each change. (Do not overcomplicate at first; tune with data.)
Recommended initial monthly budgets (adjust after 23 cycles with evidence):
* Tier 0: 300 RP/month
* Tier 1: 200 RP/month
* Tier 2: 120 RP/month
* Tier 3: 80 RP/month
> Interpretation: Tier 3 ships fewer “risky” changes; it can still ship frequently, but changes must be decomposed into low-risk diffs and shipped with strong controls.
### 5.3 Risk Point scoring (how changes consume budget)
Every change gets a **Release Risk Score (RRS)** in RP.
A practical baseline model:
**RRS = Base(criticality) + Diff Risk + Operational Context Mitigations**
**Base (criticality):**
* Tier 0: +1
* Tier 1: +3
* Tier 2: +6
* Tier 3: +10
**Diff Risk (additive):**
* +1: docs, comments, non-executed code paths, telemetry-only additions
* +3: UI changes, non-core logic changes, refactors with high test coverage
* +6: API contract changes, dependency upgrades, medium-complexity logic in a core path
* +10: database schema migrations, auth/permission logic, data retention/PII handling
* +15: infra/networking changes, encryption/key handling, payment flows, queue semantics changes
**Operational Context (additive):**
* +5: service currently in incident or had Sev1/Sev2 in last 7 days
* +3: error budget < 50% remaining
* +2: on-call load high (paging above normal baseline)
* +5: release during restricted windows (holidays/freeze) via exception
**Mitigations (subtract):**
* 3: feature flag with staged rollout + instant kill switch verified
* 3: canary + automated health gates + rollback tested in last 30 days
* 2: high-confidence integration coverage for touched components
* 2: no data migration OR backward-compatible migration with proven rollback
* 2: change isolated behind permission boundary / limited cohort
**Minimum RRS floor:** never below 1 RP.
DM is responsible for making sure the pipeline can calculate a *default* RRS automatically and require humans only for edge cases.
### 5.4 Budget operating rules
**Budget ledger:** Maintain a per-service ledger:
* Budget allocated for the window
* RP consumed per release
* RP remaining
* Trendline (projected depletion date)
* Exceptions (break-glass releases)
**Control thresholds:**
* **Green (≥60% remaining):** normal operation
* **Yellow (3059%):** additional caution; gates tighten by 1 level for medium/high-risk diffs
* **Red (<30%):** freeze high-risk diffs; allow only low-risk changes or reliability/security work
* **Exhausted (≤0%):** releases restricted to incident fixes, security fixes, and rollback-only, with tightened gates and explicit sign-off
### 5.5 What to do when budget is low (expected behavior)
When Yellow/Red:
* PM shifts roadmap execution toward:
* reliability work, defect burn-down,
* decomposing large changes into smaller, reversible diffs,
* reducing scope of risky features.
* DM enforces:
* smaller diffs,
* increased feature flagging,
* staged rollout requirements,
* improved test/observability coverage.
Budget constraints are a signal, not a punishment.
### 5.6 Budget replenishment and incentives
Budgets replenish on the window boundary, but we also allow **earned capacity**:
* If a service improves change failure rate and MTTR for 2 consecutive windows, it may earn:
* +1020% budget increase **or**
* one gate level relaxation for specific change categories
This must be evidence-driven (metrics, not opinions).
---
## 6) Diff-Aware Release Gates
### 6.1 Diff classification (what the pipeline must detect)
At minimum, automatically classify diffs into these categories:
**Code scope**
* Executable code vs docs-only
* Core vs non-core modules (define module ownership boundaries)
* Hot paths (latency-sensitive), correctness-sensitive paths
**Data scope**
* Schema migration (additive vs breaking)
* Backfill jobs / batch jobs
* Data model changes impacting downstream consumers
* PII / regulated data touchpoints
**Security scope**
* Authn/authz logic
* Permission checks
* Secrets, key handling, encryption changes
* Dependency changes with known CVEs
**Infra scope**
* IaC changes, networking, load balancer, DNS, autoscaling
* Runtime config changes (feature flags, limits, thresholds)
* Queue/topic changes, retention settings
**Interface scope**
* Public API contract changes
* Backward compatibility of payloads/events
* Client version dependency
### 6.2 Gate levels
Define **Gate Levels G0G4**. The pipeline assigns one based on diff + context + budget.
#### G0 — No-risk / administrative
Use for:
* docs-only, comments-only, non-functional metadata
Requirements:
* Lint/format checks
* Basic CI pass (build)
#### G1 — Low risk
Use for:
* small, localized code changes with strong unit coverage
* non-core UI changes
* telemetry additions (no removal)
Requirements:
* All automated unit tests
* Static analysis/linting
* 1 peer review (code owner not required if outside critical modules)
* Automated deploy to staging
* Post-deploy smoke checks
#### G2 — Moderate risk
Use for:
* moderate logic changes in customer-facing paths
* dependency upgrades
* API changes that are backward compatible
* config changes affecting behavior
Requirements:
* G1 +
* Integration tests relevant to impacted modules
* Code owner review for touched modules
* Feature flag required if customer impact possible
* Staged rollout: canary or small cohort
* Rollback plan documented in PR
#### G3 — High risk
Use for:
* schema migrations
* auth/permission changes
* core business logic in critical flows
* infra changes affecting availability
* non-trivial concurrency/queue semantics changes
Requirements:
* G2 +
* Security scan + dependency audit (must pass, exceptions logged)
* Migration plan (forward + rollback) reviewed
* Load/performance checks if in hot path
* Observability: new/updated dashboards/alerts for the change
* Release captain / on-call sign-off (someone accountable live)
* Progressive delivery with automatic health gates (error rate/latency)
#### G4 — Very high risk / safety-critical / budget-constrained releases
Use for:
* Tier 3 critical systems with low budget remaining
* changes during freeze windows via exception
* broad blast radius changes (platform-wide)
* remediation after major incident where recurrence risk is high
Requirements:
* G3 +
* Formal risk review (PM+DM+Security/SRE) in writing
* Explicit rollback rehearsal or prior proven rollback path
* Extended canary period with success criteria and abort criteria
* Customer comms plan if impact is plausible
* Post-release verification checklist executed and logged
### 6.3 Gate selection logic (policy)
Default rule:
1. Compute **RRS** (Risk Points) from diff + context.
2. Map RRS to default gate:
* 15 RP G1
* 612 RP G2
* 1320 RP G3
* 21+ RP G4
3. Apply modifiers:
* If **budget Yellow**: escalate one gate for changes G2
* If **budget Red**: escalate one gate for changes G1 and block high-risk categories unless exception
* If active incident or error budget severely degraded: block non-fix releases by default
DM must ensure the pipeline enforces this mapping automatically.
### 6.4 “Diff-aware” also means “blast-radius aware”
If the diff is inherently risky, reduce risk operationally:
* feature flags with cohort controls
* dark launches (ship code disabled)
* canary deployments
* blue/green with quick revert
* backwards-compatible DB migrations (expand/contract pattern)
* circuit breakers and rate limiting
* progressive exposure by tenant / region / account segment
Large diffs are not made safe by more reviewers; they are made safe by **reversibility and containment**.
---
## 7) Exceptions (“break glass”) policy
Exceptions are permitted only when one of these is true:
* incident mitigation or customer harm prevention,
* urgent security fix (actively exploited or high severity),
* legal/compliance deadline.
**Requirements for any exception:**
* Recorded rationale in the PR/release ticket
* Named approver(s): DM + on-call owner; PM for customer-impacting risk
* Mandatory follow-up within 5 business days:
* post-incident or post-release review
* remediation tasks created and prioritized
* **Budget penalty:** subtract additional RP (e.g., +50% of the changes RRS) to reflect unmanaged risk
Repeated exceptions are a governance failure and trigger gate tightening.
---
## 8) Operational metrics (what PMs and DMs must review)
Minimum weekly review dashboard per service:
* **Risk budget remaining** (RP and %)
* **Deploy frequency**
* **Change failure rate**
* **MTTR**
* **Sev1/Sev2 count** (rolling 30/90 days)
* **SLO / error budget status**
* **Gate compliance rate** (how often gates were bypassed)
* **Diff size distribution** (are we shipping huge diffs?)
* **Rollback frequency and time-to-rollback**
Policy expectation:
* If change failure rate or MTTR worsens materially over 2 windows, budgets tighten and gate mapping escalates until stability returns.
---
## 9) Practical operating cadence
### Weekly (PM + DM)
* Review budgets and trends
* Identify upcoming high-risk releases and plan staged rollouts
* Confirm staffing for release windows (release captain / on-call coverage)
* Decide whether to defer, decompose, or harden changes
### Per release (DM-led, PM informed)
* Ensure correct gate level
* Verify rollout + rollback readiness
* Confirm monitoring/alerts exist and are watched during rollout
* Execute post-release verification checklist
### Monthly (leadership)
* Adjust tier assignments if product criticality changed
* Recalibrate budget numbers based on measured outcomes
* Identify systemic causes: test gaps, observability gaps, deployment tooling gaps
---
## 10) Required templates (standardize execution)
### 10.1 Release Plan (required for G2+)
* What is changing (13 bullets)
* Expected customer impact (or none”)
* Diff category flags (DB/auth/infra/API/etc.)
* Rollout strategy (canary/cohort/blue-green)
* Abort criteria (exact metrics/thresholds)
* Rollback steps (exact commands/process)
* Owners during rollout (names)
### 10.2 Migration Plan (required for schema/data changes)
* Migration type: additive / expand-contract / breaking (breaking is disallowed without explicit G4 approval)
* Backfill approach and rate limits
* Validation checks (row counts, invariants)
* Rollback strategy (including data implications)
### 10.3 Post-release Verification Checklist (G1+)
* Smoke test results
* Key dashboards checked (latency, error rate, saturation)
* Alerts status
* User-facing workflows validated (as applicable)
* Ticket updated with outcome
---
## 11) What “good” looks like
* Low-risk diffs ship quickly with minimal ceremony (G0G1).
* High-risk diffs are decomposed and shipped progressively, not heroically.
* Risk budgets are visible, used in planning, and treated as a real constraint.
* Exceptions are rare and followed by concrete remediation.
* Over time: deploy frequency stays high while change failure rate and MTTR decrease.
---
## 12) Immediate adoption checklist (first 30 days)
**DM deliverables**
* Implement diff classification in CI/CD (at least: DB/auth/infra/API/deps/config)
* Implement automatic gate mapping and enforcement
* Add release plan and rollback plan checks for G2+
* Add logging for gate overrides
**PM deliverables**
* Confirm service tiering for owned areas
* Approve initial monthly RP budgets
* Add risk budget review to the weekly product/engineering ritual
* Reprioritize work when budgets hit Yellow/Red (explicitly)
---
If you want, I can also provide:
* a concrete scoring worksheet (ready to paste into Confluence/Notion),
* a CI/CD policy example (e.g., GitHub Actions / GitLab rules) that computes gate level from diff patterns,
* and a one-page Release Captain Runbook aligned to G2G4.