Here’s a compact playbook for making Stella Ops stand out on **binary‑only analysis** quality and **deterministic, explainable scoring**—from concepts to dev‑ready specs. # Binary‑only analysis & call‑graph fidelity **Goal:** prove we reach the *right* code, not just flag files. **Why it matters (plain English):** * Many scanners “see” a CVE but can’t show how execution reaches it. You need proof you can actually hit the bad function from an app entrypoint. **North‑star metrics (automate in CI):** * **Precision / Recall** vs a small **ground‑truth corpus** (curated samples with known reachable/unreachable sinks). * **TTFRP (Time‑to‑First‑Reachable‑Path)**: ms from analyzer start to first valid call‑path. * **Runnable call‑stack snippets %**: fraction of findings that include a minimal, compilable snippet (or pseudo‑IR) reproducing the call chain. * **Deterministic replay %**: identical proofs (hash‑equal) across OS/CPU/container. **Reproducible‑run contract:** * **Scan Manifest (DSSE‑signed)**: inputs, toolchain versions, lattice policies, feed hashes, CFG/CG build params, symbolization mode, and hash of the “proof‑builder”. * **Proof Bundle**: * `/proofs/{findingId}/callgraph.pb` (protobuf/flatbuffers) * `/proofs/{findingId}/path_0.ir` (SSA/IL) * `/proofs/{findingId}/snippet_0/` (repro harness) * `/attestations/` (rekor‑ready, optional PQ mode) * **Determinism switch**: `--deterministic --seed <32b> --clock fake --fs-order stable`. **Reachability engine (binary‑only) – minimal architecture:** * **Loader**: ELF/PE/Mach‑O parser; symbolizer; DWARF/PDB if present. * **IR lifter**: (capstone/keystone‑style) → SSA/typed IL with conservatively modeled calls (PLT/IAT, vtables, GOT). * **CG/CFG builder**: merges static edges + lightweight dynamic summaries (known stdlib shims); optional ML‑assisted indirect‑call resolution gated by proofs. * **Path search**: bounded BFS/IDDFS from trusted entrypoints to vulnerable sinks; emits **proof trees**. * **Snippet builder**: replays path with mocks for I/O; generates runnable harness or pseudo‑IR transcript. **Ground‑truth corpus (starter set):** * 20 binaries with injected sinks: 10 reachable, 10 unreachable, mixed obfuscation, stripped/unstripped, PIE/ASLR on/off, with/without CFI. * Tag each sample with `sink_signature`, `expected_paths`, `expected_unreachable_reasons`. **CI tasks (agents can implement now):** * `scanner.webservice`: `/bench/run` → runs corpus; exports metrics JSON + HTML. * `scheduler.webservice`: nightly + per‑PR comparisons; **fail gate** if precision or deterministic‑replay dips > 1.0 pt vs baseline. * `notify.webservice`: posts TTFRP trend + top regressions to PR. --- # Deterministic score proofs & Unknowns ranking **Goal:** every risk score must be *explainable and replayable*. Unknowns shouldn’t be noisy; they should be transparently *ranked*. **Plain English:** * A score should read like a ledger: “Input X + Rule Y → +0.12 risk, because Z”. Unknowns are the “we don’t know yet” items—rank them by potential blast radius and thin evidence. **Signed proof‑trees (spec):** * **Node types:** `Input` (SBOM/VEX/event), `Transform` (policy/lattice op), `Delta` (numeric change), `Score`. * **Fields:** `id`, `parentIds[]`, `sha256`, `ruleId`, `evidenceRefs[]`, `timestamp`, `actor` (module), `determinismSeed`. * **Encoding:** CBOR/Flatbuffers; DSSE‑signed; top hash anchored to ledger (optional Rekor v2 mirror). * **Replayer:** `stella score replay --bundle proofs/ --seed ` must output identical totals and per‑rule deltas. **Unknowns Registry & ranking:** * **Unknown** = missing VEX, missing exploitability signal, ambiguous call edge, missing version provenance, or opaque packer. * **Rank factors (weighted):** * **Blast radius:** transitive dependents, runtime privilege, exposure surface (net‑facing? in container PID 1?). * **Evidence scarcity:** how many critical facts are missing? * **Exploit pressure:** EPSS percentile (if available), KEV presence, chatter density (feeds). * **Containment signals:** sandboxing, seccomp, read‑only FS, eBPF/LSM denies observed. * **Output:** `unknowns.score ∈ [0,1]` + **proof path** explaining the rank. **Quiet‑update UX (proof‑linked):** * Unknown cards are **gated**: collapsed by default; show top 3 reasons with “View proof”. * As VEX/EPSS feeds refresh, the proof‑tree updates; the UI shows **what changed and why** (delta view). --- # Minimal schemas (drop‑in to Stella Ops) ```yaml # scoring/proof-tree.fbs (conceptual) Table Node { id:string; kind:enum{Input,Transform,Delta,Score}; parentIds:[string]; ruleId:string; sha256:string; evidenceRefs:[string]; ts:ulong; actor:string; delta:float; total:float; seed:[ubyte]; } # unknowns/unknown-item.json { "id": "unk_…", "artifactPurl": "pkg:…", "reasons": ["missing_vex", "ambiguous_indirect_call"], "blastRadius": { "dependents": 42, "privilege": "root", "netFacing": true }, "evidenceScarcity": 0.7, "exploitPressure": { "epss": 0.83, "kev": false }, "containment": { "seccomp": "enforced", "fs": "ro" }, "score": 0.66, "proofRef": "proofs/unk_…/tree.cbor" } ``` --- # Triggering & pipelines (existing services) * **scanner.webservice** * Emits **Proof Bundle** + Unknowns for each image/binary. * API: `POST /scan?deterministic=true&seed=…&emitProofs=true`. * **scheduled.webservice** * Periodic **feed refresh** (VEX/EPSS/KEV) → runs **proof replayer**; updates Unknowns ranks (no rescans). * **notify.webservice** * Sends **delta‑proof digests** to PRs/Chat: “EPSS↑ from 0.41→0.58, Unknown score +0.06 (proof link)”. * **concelier** (feeds) * Normalizes EPSS, KEV, vendor advisories; versioned with hashes in the **Scan Manifest**. * **excititor** (VEX aggregator) * Produces **explainable VEX merges**: emits **Transform nodes** with ruleIds referencing lattice policies. --- # Developer guidelines (do this first) 1. **Add deterministic flags** to all scanners and proof emitters (`--deterministic`, `--seed`). 2. **Implement Proof Bundle writer** (Flatbuffers/CBOR + DSSE). Include per‑rule deltas and top hash. 3. **Create Ground‑Truth Corpus** repo and CI job; publish precision/recall/TTFRP dashboards. 4. **Unknowns Registry** micro‑model + ranking function; expose `/unknowns/list?sort=score`. 5. **Quiet‑update UI**: Unknowns cards with “View proof”; delta badges when feeds change. 6. **Replay CLI**: `stella score replay` + `stella proof verify` (DSSE + hash match). 7. **Audit doc**: one‑pager “How to reproduce my score”—copy/paste commands from the manifest. --- # Tiny .NET 10 sketch (partial, compile‑ready) ```csharp public record ProofNode( string Id, string Kind, string[] ParentIds, string RuleId, string Sha256, string[] EvidenceRefs, DateTimeOffset Ts, string Actor, double Delta, double Total, byte[] Seed); public interface IScoreLedger { void Append(ProofNode node); double CurrentTotal { get; } } public sealed class DeterministicLedger : IScoreLedger { private readonly List _nodes = new(); private double _total; public void Append(ProofNode n) { // Deterministic ordering by (Ts, Id) already enforced upstream. _total = n.Total; _nodes.Add(n); } public double CurrentTotal => _total; } public static class UnknownRanker { public static double Rank(Blast b, double scarcity, double epss, bool kev, Containment c) { var br = (Math.Min(b.Dependents/50.0,1.0) + (b.NetFacing?0.5:0) + (b.Privilege=="root"?0.5:0))/2.0; var ep = Math.Min(epss + (kev?0.3:0), 1.0); var ct = (c.Seccomp=="enforced"?-0.1:0) + (c.Fs=="ro"?-0.1:0); return Math.Clamp(0.6*br + 0.3*scarcity + 0.3*ep + ct, 0, 1); } } ``` --- # What you get if you ship this * **Trust‑on‑paper → trust‑in‑proofs**: every score and “unknown” is backed by a tamper‑evident path. * **Noise control**: Unknowns don’t spam—ranked, gated, and auto‑updated when new evidence arrives. * **Moat**: reproducible evidence + runnable call‑stacks is hard to copy and easy to demo. If you want, I can turn this into concrete tickets for `scanner.webservice`, `excititor`, `concelier`, `notify`, plus a first corpus seed and CI wiring. What I described is two **evidence upgrades** that turn Stella Ops from “SBOM/VEX parity” into “provable, replayable security decisions”: 1. **Binary-only reachability proofs** 2. **Deterministic score proofs + ranked Unknowns** Below is the purpose (why you want it) and a concrete implementation plan for Stella Ops (aligned with your rule: **lattice algorithms run in `scanner.webservice`; Concelier/Excititor preserve prune source**). --- ## 1) Binary-only reachability: purpose Most scanners stop at: “this image contains libX version Y with CVE-Z”. That creates noise because: * The vulnerable function may be **present but never callable** from any real entrypoint. * The vulnerability may be in a code path guarded by config, privilege, seccomp, or missing inputs. **Reachability** answers the only question that matters operationally: > “Can execution reach the vulnerable sink from a real entry point in this container/app?” **What Stella should output for a “reachable” finding** * “Entry: nginx worker → module init → … → vulnerable function” * A **call path proof** (graph + concrete nodes/addresses/symbols) * Optional: a minimal repro harness/snippet or IR transcript **Why this is a moat** * It reduces false positives materially (and you can *measure* it). * It produces auditor-friendly evidence (“show me the path”). --- ## 2) Deterministic score proofs + ranked Unknowns: purpose Security teams distrust opaque scores. Auditors and regulated clients require repeatability. **Deterministic scoring proof** means: * Every score is a **ledger** of deltas (“+0.12 because EPSS=…, +0.18 because reachable path exists, −0.07 because seccomp enforced…”). * The score can be **replayed** later and must match bit-for-bit given the same inputs (feeds, rules, policies, seed). **Unknowns** are the “we don’t know yet” facts (missing VEX, ambiguous versions, unresolved indirect call edges). Instead of spamming, Stella ranks Unknowns by **likely impact** so DevOps sees the top 1–5 that actually matter. --- # Implementation plan for Stella Ops ## Phase 0 — Lay the foundation (1 sprint) **Goal:** make scans replayable and attach proofs to findings even before reachability is “perfect”. ### 0.1 Create a signed Scan Manifest (system-of-record in Postgres) A manifest is a declarative capture of *everything that affects results*. **Store:** * artifact digest(s) * tool versions (scanner workers + rule engine) * Concelier snapshot hash(es) used * Excititor snapshot hash(es) used * lattice/policy digest (executed in `scanner.webservice`) * deterministic flags + seed * config knobs (depth limits, indirect-call resolution mode, etc.) **Deliverables** * `scan_manifest` table in Postgres * DSSE signature for the manifest * `GET /scan/{id}/manifest` endpoint ### 0.2 Proof Bundle format + storage **Store proof artifacts content-addressed** (zip or directory) and reference them from findings. **Bundle contains** * callgraph subset (or placeholder graph in v0) * score proof tree (CBOR/FlatBuffers) * references to evidence inputs (SBOM/VEX/feeds digests) **Deliverables** * `proof_bundle` metadata table in Postgres (uri, root_hash, dsse_envelope) * filesystem/S3-compatible storage adapter * `GET /scan/{id}/proofs/{findingId}` endpoint --- ## Phase 1 — Deterministic scoring + Unknowns (1–2 sprints) **Goal:** every score becomes replayable; Unknowns become a controlled queue. ### 1.1 Score Proof Tree “ledger” Implement a small internal library in .NET: * pure functions: inputs → score + proof nodes * nodes: `Input`, `Transform`, `Delta`, `Score` * deterministic ordering and hashing **Deliverables** * `stella score replay --scan --seed ` CLI (or internal job) * `POST /score/replay` in `scanner.webservice` (recompute score without rescanning binaries) * `score_proofs` stored in the Proof Bundle ### 1.2 Unknowns registry + ranking (computed in scanner.webservice) Unknown reasons (examples): * missing VEX for a CVE/component * version provenance uncertain * ambiguous indirect call edge for reachability * packed/stripped binary blocking symbolization **Ranking model (deterministic)** * blast radius (dependents, privilege, net-facing) * evidence scarcity (how many critical facts missing) * exploit pressure (EPSS/KEV presence if available via Concelier snapshot) * containment signals (seccomp/RO-fs observed) **Deliverables** * `unknowns` table + API `GET /unknowns?sort=score` * unknown proof tree (why it’s ranked #1) * UI: Unknowns collapsed by default; top reasons + “view proof” ### 1.3 Feed refresh re-scores without rescans Respect your architecture rule: * Concelier/Excititor publish **snapshots** (preserve prune source) * `scanner.webservice` runs lattice + scoring **Flow** 1. Scheduled detects a new Concelier/Excititor snapshot hash 2. Scheduled calls `scanner.webservice /score/replay` for impacted scans 3. Notify emits “score delta” + proof link **Deliverables** * `scheduled.webservice` job: “rescore impacted scans” * `notify.webservice` message template: “what changed + proof root hash” --- ## Phase 2 — Binary reachability engine v1 (2–3 sprints) **Goal:** ship a reachability proof that is *useful today*, then iterate fidelity. ### 2.1 v1 scope (pragmatic) Start with: * ELF (Linux containers) first * imports/exports + PLT/GOT edges * direct calls + conservative handling of indirect calls * entrypoints: `main`, exported functions, known framework entry hooks **What v1 outputs** * “reachable / not proven reachable / unknown” * shortest path found (bounded depth) * proof subgraph: nodes + edges + address ranges + symbol names if present **Deliverables** * `scanner.worker.binary` (or binary module inside scanner worker) produces: * CFG/CG summary artifact * per-finding path proof (if found) * TTFRP metric (time-to-first-reachable-path) ### 2.2 Proof format for reachability For each finding: * `callgraph.pb` (or flatbuffers) * `path_0.ir` (text SSA/IL transcript OR “disasm trace” v1) * `evidence.json` (addresses, symbolization mode, loader metadata) ### 2.3 Ground-truth corpus + CI gates Create a small repo of curated binaries with known reachable/unreachable sinks. Run nightly and per-PR. **Gates** * precision/recall must not regress * deterministic replay must remain 100% on corpus * TTFRP tracked (trend, not hard fail initially) **Deliverables** * `scanner.webservice /bench/run` * scheduler nightly bench * notify posts regressions in PR --- ## Phase 3 — “Best in class” improvements (ongoing) * Better indirect call resolution (vtables, function pointers) with proof constraints * Stripped binary symbol recovery heuristics * Optional snippet/harness generator (start as IR transcript, evolve to runnable) * Multi-arch support (arm64) and PE/Mach-O if needed --- # Concrete service responsibilities (so your team doesn’t misplace logic) ### Concelier (feeds) * ingest EPSS/KEV/advisories * produce **versioned snapshots** with hashes * **does not** run lattice/scoring ### Excititor (VEX) * ingest vendor/openvex/csaf signals * normalize and emit snapshots with hashes * **preserve prune source** * **does not** run lattice/scoring ### Scanner.webservice * orchestrates scanning workers * runs lattice merge + scoring * emits manifest + proofs * exposes replay endpoints ### Scheduled.webservice * detects new snapshot hashes * triggers replay/rescore jobs (no rescans) * schedules benchmark runs ### Notify.webservice * delivers deltas with proof links (root hash + manifest hash) --- # Minimum “definition of done” checklist You’ll know this is implemented when: 1. Every scan stores a DSSE-signed **Scan Manifest** in Postgres. 2. Every finding has a **proofRef** (score proof always; reachability proof where available). 3. `score replay` reproduces the same total and per-rule deltas given the manifest. 4. Unknowns are visible, ranked, and proof-linked, and updates from feeds change ranks via replay (not rescans). 5. Corpus CI exists and prevents silent regressions. --- If you want, I can translate this plan into: * Postgres schema migrations (tables + indexes), * REST endpoints for `scanner.webservice`, `scheduled.webservice`, `notify.webservice`, * and a sprint-ready TASKS.md split per module with acceptance tests. Below is an “implementation extension” with (1) concrete goals/DoD and (2) major code snippets your developers can lift directly into Stella Ops (.NET 10/C#). I’m keeping the architecture rule intact: **Concelier + Excititor only emit snapshots (preserve prune source); `scanner.webservice` runs lattice/scoring and emits proofs**. System of record is **Postgres**, with **Valkey optional/ephemeral**. --- ## 0) Concrete goals and Definition of Done ### Phase A — Deterministic scan + Proof infrastructure (must ship first) **Goal A1 — Scan Manifest exists and is DSSE-signed** * Every scan produces a `ScanManifest` containing: * artifact digest(s) (image digest, file digest) * scanner versions * **concelierSnapshotHash**, **excititorSnapshotHash** * lattice/policy hash (executed in `scanner.webservice`) * deterministic flags + seed * config knobs (depth limits, indirect-call resolution mode, etc.) * Manifest stored in Postgres and in the Proof Bundle. * Manifest DSSE signature verified by `stella proof verify`. **Goal A2 — Proof Bundle exists for every scan** * Proof bundle is content-addressed: `rootHash` + DSSE envelope stored. * Bundle contains at minimum: * `manifest.json` (canonical) * `score_proof.cbor` (or canonical JSON v1) * `evidence_refs.json` (digests of inputs) **DoD** * Same scan inputs + same seed produce identical manifest hash and identical proof root hash. --- ### Phase B — Deterministic scoring ledger + replay **Goal B1 — Scoring is a pure function** * `Score = f(Manifest, Findings, FeedSnapshot, VEXSnapshot, RuntimeSignals?, Seed)` * Every numeric change is recorded as a proof node (`Delta`) with evidence references. **Goal B2 — Replay** * `POST /score/replay` recomputes scores from manifest + snapshot hashes without rescanning binaries. * Replay output (proof root hash + totals) is identical across runs. **DoD** * Replay for a prior scan must reproduce bit-identical proof output (hash match). --- ### Phase C — Unknowns registry + deterministic ranking **Goal C1 — Unknowns are first-class** * Unknown item emitted when evidence is missing or ambiguous: * missing VEX, ambiguous component version, unresolved indirect-call edge, packed binary, etc. * Unknowns ranked deterministically with a proof trail. **DoD** * UI shows top-ranked Unknowns collapsed by default; every Unknown has “View proof”. --- ### Phase D — Binary-only reachability v1 (useful quickly) **Goal D1 — Reachability classification** * Each vulnerable sink gets: `Reachable | NotProvenReachable | Unknown` * When reachable, emit a shortest path proof (bounded BFS) from entrypoint. **Goal D2 — TTFRP metric** * Emit TTFRP and store per scan. **DoD** * Corpus benchmark job runs nightly and tracks precision/recall + TTFRP trends. --- ## 1) Core data models (Manifest, Proof Nodes, Unknowns) ### 1.1 ScanManifest (canonical JSON for hashing) ```csharp public sealed record ScanManifest( string ScanId, DateTimeOffset CreatedAtUtc, string ArtifactDigest, // sha256:... or image digest string ArtifactPurl, // optional string ScannerVersion, // scanner.webservice version string WorkerVersion, // scanner.worker.* version string ConcelierSnapshotHash, // immutable feed snapshot digest string ExcititorSnapshotHash, // immutable vex snapshot digest string LatticePolicyHash, // policy bundle digest bool Deterministic, byte[] Seed, // 32 bytes IReadOnlyDictionary Knobs // depth limits etc. ); ``` ### 1.2 ProofNode (ledger entries) ```csharp public enum ProofNodeKind { Input, Transform, Delta, Score } public sealed record ProofNode( string Id, ProofNodeKind Kind, string RuleId, string[] ParentIds, string[] EvidenceRefs, // digests / refs inside bundle double Delta, // 0 for non-Delta nodes double Total, // running total at this node string Actor, // module name DateTimeOffset TsUtc, byte[] Seed, string NodeHash // sha256 over canonical node (excluding NodeHash) ); ``` ### 1.3 UnknownItem ```csharp public sealed record UnknownItem( string Id, string ArtifactDigest, string ArtifactPurl, string[] Reasons, BlastRadius BlastRadius, double EvidenceScarcity, ExploitPressure ExploitPressure, ContainmentSignals Containment, double Score, // 0..1 string ProofRef // path inside proof bundle ); public sealed record BlastRadius(int Dependents, bool NetFacing, string Privilege); // "root"/"user" public sealed record ExploitPressure(double? Epss, bool Kev); public sealed record ContainmentSignals(string Seccomp, string Fs); // "enforced"/"none", "ro"/"rw" ``` --- ## 2) Canonical JSON + hashing (determinism foundation) ### 2.1 Canonicalize JSON (sort object keys recursively) ```csharp using System.Security.Cryptography; using System.Text; using System.Text.Json; public static class CanonJson { public static byte[] Canonicalize(T obj) { var json = JsonSerializer.SerializeToUtf8Bytes(obj, new JsonSerializerOptions { WriteIndented = false, PropertyNamingPolicy = JsonNamingPolicy.CamelCase }); using var doc = JsonDocument.Parse(json); using var ms = new MemoryStream(); using var writer = new Utf8JsonWriter(ms, new JsonWriterOptions { Indented = false }); WriteElementSorted(doc.RootElement, writer); writer.Flush(); return ms.ToArray(); } private static void WriteElementSorted(JsonElement el, Utf8JsonWriter w) { switch (el.ValueKind) { case JsonValueKind.Object: w.WriteStartObject(); foreach (var prop in el.EnumerateObject().OrderBy(p => p.Name, StringComparer.Ordinal)) { w.WritePropertyName(prop.Name); WriteElementSorted(prop.Value, w); } w.WriteEndObject(); break; case JsonValueKind.Array: w.WriteStartArray(); foreach (var item in el.EnumerateArray()) WriteElementSorted(item, w); w.WriteEndArray(); break; default: el.WriteTo(w); break; } } public static string Sha256Hex(ReadOnlySpan bytes) => Convert.ToHexString(SHA256.HashData(bytes)).ToLowerInvariant(); } ``` --- ## 3) DSSE envelope (sign manifests and proof roots) ### 3.1 DSSE types + signer abstraction ```csharp public sealed record DsseEnvelope( string PayloadType, string Payload, // base64 DsseSignature[] Signatures ); public sealed record DsseSignature(string KeyId, string Sig); // base64 sig public interface IContentSigner { string KeyId { get; } byte[] Sign(ReadOnlySpan message); bool Verify(ReadOnlySpan message, ReadOnlySpan signature); } ``` ### 3.2 DSSE build (DSSE preauth encoding) ```csharp using System.Text; public static class Dsse { // DSSE PAE: // PAE("DSSEv1", payloadType, payload) public static byte[] PAE(string payloadType, ReadOnlySpan payload) { static byte[] Len(byte[] b) => Encoding.UTF8.GetBytes(b.Length.ToString()); var pt = Encoding.UTF8.GetBytes(payloadType); var dsse = Encoding.UTF8.GetBytes("DSSEv1"); using var ms = new MemoryStream(); void WritePart(byte[] part) { ms.Write(Len(part)); ms.WriteByte((byte)' '); ms.Write(part); ms.WriteByte((byte)' '); } WritePart(dsse); WritePart(pt); ms.Write(Len(payload.ToArray())); ms.WriteByte((byte)' '); ms.Write(payload); return ms.ToArray(); } public static DsseEnvelope SignJson(string payloadType, T payloadObj, IContentSigner signer) { var payload = CanonJson.Canonicalize(payloadObj); var pae = PAE(payloadType, payload); var sig = signer.Sign(pae); return new DsseEnvelope( payloadType, Convert.ToBase64String(payload), new[] { new DsseSignature(signer.KeyId, Convert.ToBase64String(sig)) } ); } } ``` ### 3.3 ECDSA P-256 signer (portable default) ```csharp using System.Security.Cryptography; public sealed class EcdsaP256Signer : IContentSigner, IDisposable { private readonly ECDsa _ecdsa; public string KeyId { get; } public EcdsaP256Signer(string keyId, ECDsa ecdsa) { KeyId = keyId; _ecdsa = ecdsa; } public byte[] Sign(ReadOnlySpan message) => _ecdsa.SignData(message.ToArray(), HashAlgorithmName.SHA256); public bool Verify(ReadOnlySpan message, ReadOnlySpan signature) => _ecdsa.VerifyData(message.ToArray(), signature.ToArray(), HashAlgorithmName.SHA256); public void Dispose() => _ecdsa.Dispose(); } ``` --- ## 4) Proof ledger: append nodes, compute node hashes, compute root hash ### 4.1 Node hashing (exclude NodeHash itself) ```csharp public static class ProofHashing { public static ProofNode WithHash(ProofNode n) { var canonical = CanonJson.Canonicalize(new { n.Id, n.Kind, n.RuleId, n.ParentIds, n.EvidenceRefs, n.Delta, n.Total, n.Actor, n.TsUtc, Seed = Convert.ToBase64String(n.Seed) }); return n with { NodeHash = "sha256:" + CanonJson.Sha256Hex(canonical) }; } public static string ComputeRootHash(IEnumerable nodesInOrder) { // Deterministic: root hash over canonical JSON array of node hashes in order. var arr = nodesInOrder.Select(n => n.NodeHash).ToArray(); var bytes = CanonJson.Canonicalize(arr); return "sha256:" + CanonJson.Sha256Hex(bytes); } } ``` ### 4.2 Minimal ledger (deterministic ordering enforced by append order) ```csharp public sealed class ProofLedger { private readonly List _nodes = new(); public IReadOnlyList Nodes => _nodes; public void Append(ProofNode node) { _nodes.Add(ProofHashing.WithHash(node)); } public string RootHash() => ProofHashing.ComputeRootHash(_nodes); } ``` --- ## 5) Deterministic scoring function (with proof nodes) ### 5.1 Example scoring pipeline (CVSS + EPSS + reachability + containment) ```csharp public sealed record ScoreInputs( double CvssBase, // 0..10 double? Epss, // 0..1 bool Kev, ReachabilityClass Reachability, // Reachable/NotProven/Unknown ContainmentSignals Containment ); public enum ReachabilityClass { Reachable, NotProvenReachable, Unknown } public static class RiskScoring { public static (double Score01, ProofLedger Ledger) Score( ScoreInputs input, string scanId, byte[] seed, DateTimeOffset tsUtc) { var ledger = new ProofLedger(); var total = 0.0; // Input node ledger.Append(new ProofNode( Id: $"in:{scanId}", Kind: ProofNodeKind.Input, RuleId: "inputs.v1", ParentIds: Array.Empty(), EvidenceRefs: Array.Empty(), Delta: 0, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); // CVSS base mapping var cvss01 = Math.Clamp(input.CvssBase / 10.0, 0, 1); total += 0.55 * cvss01; ledger.Append(new ProofNode( Id: $"d:cvss:{scanId}", Kind: ProofNodeKind.Delta, RuleId: "score.cvss_base.weighted", ParentIds: new[] { $"in:{scanId}" }, EvidenceRefs: new[] { $"cvss:{input.CvssBase:0.0}" }, Delta: 0.55 * cvss01, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); // EPSS (optional) if (input.Epss is { } epss) { total += 0.25 * Math.Clamp(epss, 0, 1); ledger.Append(new ProofNode( Id: $"d:epss:{scanId}", Kind: ProofNodeKind.Delta, RuleId: "score.epss.weighted", ParentIds: new[] { $"d:cvss:{scanId}" }, EvidenceRefs: new[] { $"epss:{epss:0.0000}" }, Delta: 0.25 * epss, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); } // KEV boosts urgency if (input.Kev) { total += 0.15; ledger.Append(new ProofNode( Id: $"d:kev:{scanId}", Kind: ProofNodeKind.Delta, RuleId: "score.kev.bump", ParentIds: new[] { $"d:cvss:{scanId}" }, EvidenceRefs: new[] { "kev:true" }, Delta: 0.15, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); } // Reachability var reachDelta = input.Reachability switch { ReachabilityClass.Reachable => 0.20, ReachabilityClass.NotProvenReachable => 0.00, ReachabilityClass.Unknown => 0.08, // unknown still adds risk, but less than proven reachable _ => 0.00 }; total += reachDelta; ledger.Append(new ProofNode( Id: $"d:reach:{scanId}", Kind: ProofNodeKind.Delta, RuleId: "score.reachability", ParentIds: new[] { $"d:cvss:{scanId}" }, EvidenceRefs: new[] { $"reach:{input.Reachability}" }, Delta: reachDelta, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); // Containment deductions (examples) var containmentDelta = 0.0; if (string.Equals(input.Containment.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase)) containmentDelta -= 0.05; if (string.Equals(input.Containment.Fs, "ro", StringComparison.OrdinalIgnoreCase)) containmentDelta -= 0.03; total = Math.Clamp(total + containmentDelta, 0, 1); ledger.Append(new ProofNode( Id: $"d:contain:{scanId}", Kind: ProofNodeKind.Delta, RuleId: "score.containment", ParentIds: new[] { $"d:reach:{scanId}" }, EvidenceRefs: new[] { $"seccomp:{input.Containment.Seccomp}", $"fs:{input.Containment.Fs}" }, Delta: containmentDelta, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); // Final score node ledger.Append(new ProofNode( Id: $"s:{scanId}", Kind: ProofNodeKind.Score, RuleId: "score.final", ParentIds: new[] { $"d:contain:{scanId}" }, EvidenceRefs: new[] { "root" }, Delta: 0, Total: total, Actor: "scanner.webservice", TsUtc: tsUtc, Seed: seed, NodeHash: "" )); return (total, ledger); } } ``` --- ## 6) Unknown ranking (deterministic) + proof ### 6.1 Ranking function ```csharp public static class UnknownRanker { public static double Rank(BlastRadius b, double scarcity, ExploitPressure ep, ContainmentSignals c) { var dependents01 = Math.Clamp(b.Dependents / 50.0, 0, 1); var net = b.NetFacing ? 0.5 : 0.0; var priv = string.Equals(b.Privilege, "root", StringComparison.OrdinalIgnoreCase) ? 0.5 : 0.0; var blast = Math.Clamp((dependents01 + net + priv) / 2.0, 0, 1); var epss01 = ep.Epss is null ? 0.35 : Math.Clamp(ep.Epss.Value, 0, 1); // default mild pressure var kev = ep.Kev ? 0.30 : 0.0; var pressure = Math.Clamp(epss01 + kev, 0, 1); var containment = 0.0; if (string.Equals(c.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase)) containment -= 0.10; if (string.Equals(c.Fs, "ro", StringComparison.OrdinalIgnoreCase)) containment -= 0.10; return Math.Clamp(0.60 * blast + 0.30 * scarcity + 0.30 * pressure + containment, 0, 1); } } ``` ### 6.2 Unknown proof node pattern When you compute Unknown rank, emit a mini ledger identical to score proofs: * Input node: reasons + evidence scarcity facts * Delta nodes: blast/pressure/containment components * Score node: final unknown score Store it in `proofs/unknowns/{unkId}/tree.json`. --- ## 7) Proof Bundle writer (zip + root hash + DSSE) ```csharp using System.IO.Compression; public sealed class ProofBundleWriter { public static async Task<(string RootHash, string BundlePath)> WriteAsync( string baseDir, ScanManifest manifest, ProofLedger scoreLedger, DsseEnvelope manifestDsse, IContentSigner signer, CancellationToken ct) { Directory.CreateDirectory(baseDir); var manifestBytes = CanonJson.Canonicalize(manifest); var ledgerBytes = CanonJson.Canonicalize(scoreLedger.Nodes); // v1 JSON; swap to CBOR later // Root hash covers canonical content (manifest + ledger) var rootMaterial = CanonJson.Canonicalize(new { manifest = "sha256:" + CanonJson.Sha256Hex(manifestBytes), scoreProof = "sha256:" + CanonJson.Sha256Hex(ledgerBytes), scoreRoot = scoreLedger.RootHash() }); var rootHash = "sha256:" + CanonJson.Sha256Hex(rootMaterial); // DSSE sign the root descriptor var rootDsse = Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new { rootHash, scoreRoot = scoreLedger.RootHash() }, signer); var bundleName = $"{manifest.ScanId}_{rootHash.Replace("sha256:", "")}.zip"; var bundlePath = Path.Combine(baseDir, bundleName); await using var fs = File.Create(bundlePath); using var zip = new ZipArchive(fs, ZipArchiveMode.Create, leaveOpen: false); void Add(string name, byte[] content) { var e = zip.CreateEntry(name, CompressionLevel.Optimal); using var s = e.Open(); s.Write(content, 0, content.Length); } Add("manifest.json", manifestBytes); Add("manifest.dsse.json", CanonJson.Canonicalize(manifestDsse)); Add("score_proof.json", ledgerBytes); Add("proof_root.dsse.json", CanonJson.Canonicalize(rootDsse)); Add("meta.json", CanonJson.Canonicalize(new { rootHash, createdAtUtc = DateTimeOffset.UtcNow })); return (rootHash, bundlePath); } } ``` --- ## 8) Postgres schema (authoritative) and EF Core skeleton ### 8.1 Tables (SQL snippet) ```sql create table scan_manifest ( scan_id text primary key, created_at_utc timestamptz not null, artifact_digest text not null, concelier_snapshot_hash text not null, excititor_snapshot_hash text not null, lattice_policy_hash text not null, deterministic boolean not null, seed bytea not null, manifest_json jsonb not null, manifest_dsse_json jsonb not null, manifest_hash text not null ); create table proof_bundle ( scan_id text not null references scan_manifest(scan_id), root_hash text not null, bundle_uri text not null, proof_root_dsse_json jsonb not null, created_at_utc timestamptz not null, primary key (scan_id, root_hash) ); create index ix_scan_manifest_artifact on scan_manifest(artifact_digest); create index ix_scan_manifest_snapshots on scan_manifest(concelier_snapshot_hash, excititor_snapshot_hash); ``` ### 8.2 EF Core entities (minimal) ```csharp public sealed class ScannerDbContext : DbContext { public DbSet ScanManifests => Set(); public DbSet ProofBundles => Set(); public ScannerDbContext(DbContextOptions options) : base(options) { } protected override void OnModelCreating(ModelBuilder b) { b.Entity().HasKey(x => x.ScanId); b.Entity().HasKey(x => new { x.ScanId, x.RootHash }); b.Entity().HasIndex(x => x.ArtifactDigest); } } public sealed class ScanManifestRow { public string ScanId { get; set; } = default!; public DateTimeOffset CreatedAtUtc { get; set; } public string ArtifactDigest { get; set; } = default!; public string ConcelierSnapshotHash { get; set; } = default!; public string ExcititorSnapshotHash { get; set; } = default!; public string LatticePolicyHash { get; set; } = default!; public bool Deterministic { get; set; } public byte[] Seed { get; set; } = default!; public string ManifestHash { get; set; } = default!; public string ManifestJson { get; set; } = default!; // store canonical JSON string public string ManifestDsseJson { get; set; } = default!; } public sealed class ProofBundleRow { public string ScanId { get; set; } = default!; public string RootHash { get; set; } = default!; public string BundleUri { get; set; } = default!; public DateTimeOffset CreatedAtUtc { get; set; } public string ProofRootDsseJson { get; set; } = default!; } ``` --- ## 9) `scanner.webservice` endpoints (minimal APIs) ```csharp using Microsoft.AspNetCore.Builder; using Microsoft.AspNetCore.Http; using Microsoft.EntityFrameworkCore; var app = WebApplication.CreateBuilder(args) .AddServices() .Build(); app.MapPost("/scan", async (ScanRequest req, ScannerDbContext db, CancellationToken ct) => { var scanId = Guid.NewGuid().ToString("n"); var seed = req.Seed ?? RandomNumberGenerator.GetBytes(32); var created = DateTimeOffset.UtcNow; // Snapshot hashes come from your snapshot selector (by policy/environment) var concelierHash = req.ConcelierSnapshotHash; var excititorHash = req.ExcititorSnapshotHash; var manifest = new ScanManifest( ScanId: scanId, CreatedAtUtc: created, ArtifactDigest: req.ArtifactDigest, ArtifactPurl: req.ArtifactPurl ?? "", ScannerVersion: req.ScannerVersion, WorkerVersion: req.WorkerVersion, ConcelierSnapshotHash: concelierHash, ExcititorSnapshotHash: excititorHash, LatticePolicyHash: req.LatticePolicyHash, Deterministic: req.Deterministic, Seed: seed, Knobs: req.Knobs ?? new Dictionary() ); var manifestHash = "sha256:" + CanonJson.Sha256Hex(CanonJson.Canonicalize(manifest)); // Sign DSSE using var signer = YourSignerFactory.Create(); // ECDSA or other profile var dsse = Dsse.SignJson("application/vnd.stellaops.scan-manifest.v1+json", manifest, signer); db.ScanManifests.Add(new ScanManifestRow { ScanId = scanId, CreatedAtUtc = created, ArtifactDigest = req.ArtifactDigest, ConcelierSnapshotHash = concelierHash, ExcititorSnapshotHash = excititorHash, LatticePolicyHash = req.LatticePolicyHash, Deterministic = req.Deterministic, Seed = seed, ManifestHash = manifestHash, ManifestJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(manifest)), ManifestDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(dsse)) }); await db.SaveChangesAsync(ct); return Results.Ok(new { scanId, manifestHash }); }); app.MapGet("/scan/{scanId}/manifest", async (string scanId, ScannerDbContext db, CancellationToken ct) => { var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct); return Results.Text(row.ManifestJson, "application/json"); }); app.MapPost("/scan/{scanId}/score/replay", async (string scanId, ScannerDbContext db, CancellationToken ct) => { var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct); var manifest = JsonSerializer.Deserialize(row.ManifestJson)!; // Load findings + snapshots by hash (your repositories) var inputs = new ScoreInputs( CvssBase: 9.1, Epss: 0.62, Kev: false, Reachability: ReachabilityClass.Unknown, Containment: new ContainmentSignals("enforced", "ro") ); var (score, ledger) = RiskScoring.Score(inputs, scanId, manifest.Seed, DateTimeOffset.UtcNow); using var signer = YourSignerFactory.Create(); var (rootHash, bundlePath) = await ProofBundleWriter.WriteAsync( baseDir: "/var/lib/stellaops/proofs", manifest: manifest, scoreLedger: ledger, manifestDsse: JsonSerializer.Deserialize(row.ManifestDsseJson)!, signer: signer, ct: ct); db.ProofBundles.Add(new ProofBundleRow { ScanId = scanId, RootHash = rootHash, BundleUri = bundlePath, CreatedAtUtc = DateTimeOffset.UtcNow, ProofRootDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize( Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new { rootHash }, signer))) }); await db.SaveChangesAsync(ct); return Results.Ok(new { score, rootHash, bundleUri = bundlePath }); }); app.Run(); public sealed record ScanRequest( string ArtifactDigest, string? ArtifactPurl, string ScannerVersion, string WorkerVersion, string ConcelierSnapshotHash, string ExcititorSnapshotHash, string LatticePolicyHash, bool Deterministic, byte[]? Seed, Dictionary? Knobs ); ``` --- ## 10) Binary reachability v1: major skeleton (bounded BFS over a naive callgraph) This is intentionally “v1”: direct calls + imports + conservative unknowns. It still delivers value fast. ```csharp public sealed record FuncNode(ulong Address, string Name); public sealed record CallEdge(ulong From, ulong To, string Kind); // "direct"/"import"/"indirect" public sealed class CallGraph { public Dictionary Nodes { get; } = new(); public List Edges { get; } = new(); public IEnumerable Neighbors(ulong from) => Edges.Where(e => e.From == from).Select(e => e.To); } public static class Reachability { public static (ReachabilityClass Class, ulong[]? Path) FindPath( CallGraph cg, IEnumerable entrypoints, ulong sink, int maxDepth) { var visited = new HashSet(); var parent = new Dictionary(); var q = new Queue<(ulong node, int depth)>(); foreach (var ep in entrypoints) { q.Enqueue((ep, 0)); visited.Add(ep); } while (q.Count > 0) { var (cur, depth) = q.Dequeue(); if (cur == sink) return (ReachabilityClass.Reachable, Reconstruct(parent, cur)); if (depth >= maxDepth) continue; foreach (var nxt in cg.Neighbors(cur)) { if (visited.Add(nxt)) { parent[nxt] = cur; q.Enqueue((nxt, depth + 1)); } } } return (ReachabilityClass.NotProvenReachable, null); } private static ulong[] Reconstruct(Dictionary parent, ulong end) { var path = new List { end }; while (parent.TryGetValue(end, out var p)) { path.Add(p); end = p; } path.Reverse(); return path.ToArray(); } } ``` **Proof emission for reachability** * Store: * `callgraph.json` (nodes + edges subset relevant to this sink) * `path_0.json` (address chain + symbol names) * Create a scoring delta node referencing `reach:path_0.json` when reachable. --- ## 11) Determinism test (xUnit “hash must match”) ```csharp public class DeterminismTests { [Fact] public void Score_Replay_IsBitIdentical() { var seed = Enumerable.Repeat((byte)7, 32).ToArray(); var inputs = new ScoreInputs(9.0, 0.50, false, ReachabilityClass.Unknown, new("enforced","ro")); var (s1, l1) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z")); var (s2, l2) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z")); Assert.Equal(s1, s2, 10); Assert.Equal(l1.RootHash(), l2.RootHash()); Assert.True(l1.Nodes.Zip(l2.Nodes).All(z => z.First.NodeHash == z.Second.NodeHash)); } } ``` --- ## 12) What developers should implement next (priority order) 1. **Canonical JSON + hashing** (Phase A prerequisite) 2. **Manifest + DSSE signing + Postgres persistence** 3. **Proof ledger + root hash + Proof Bundle writer** 4. **Replay endpoint** (`/score/replay`) and scheduler hook to rescore on new snapshot hashes 5. **Unknown registry + deterministic ranking + proof** 6. **Reachability v1** (callgraph + bounded BFS + proof emission) 7. **Corpus bench** and CI regression gates If you want, I can convert this into repo-ready `TASKS.md` blocks per module (`scanner.webservice`, `scheduled.webservice`, `notify.webservice`) with acceptance tests and a minimal migration set aligned to your existing naming conventions.