- Implemented comprehensive unit tests for VexCandidateEmitter to validate candidate emission logic based on various scenarios including absent and present APIs, confidence thresholds, and rate limiting. - Added integration tests for SmartDiff PostgreSQL repositories, covering snapshot storage and retrieval, candidate storage, and material risk change handling. - Ensured tests validate correct behavior for storing, retrieving, and querying snapshots and candidates, including edge cases and expected outcomes.
46 KiB
Here’s a compact playbook for making Stella Ops stand out on binary‑only analysis quality and deterministic, explainable scoring—from concepts to dev‑ready specs.
Binary‑only analysis & call‑graph fidelity
Goal: prove we reach the right code, not just flag files.
Why it matters (plain English):
- Many scanners “see” a CVE but can’t show how execution reaches it. You need proof you can actually hit the bad function from an app entrypoint.
North‑star metrics (automate in CI):
- Precision / Recall vs a small ground‑truth corpus (curated samples with known reachable/unreachable sinks).
- TTFRP (Time‑to‑First‑Reachable‑Path): ms from analyzer start to first valid call‑path.
- Runnable call‑stack snippets %: fraction of findings that include a minimal, compilable snippet (or pseudo‑IR) reproducing the call chain.
- Deterministic replay %: identical proofs (hash‑equal) across OS/CPU/container.
Reproducible‑run contract:
-
Scan Manifest (DSSE‑signed): inputs, toolchain versions, lattice policies, feed hashes, CFG/CG build params, symbolization mode, and hash of the “proof‑builder”.
-
Proof Bundle:
/proofs/{findingId}/callgraph.pb(protobuf/flatbuffers)/proofs/{findingId}/path_0.ir(SSA/IL)/proofs/{findingId}/snippet_0/(repro harness)/attestations/(rekor‑ready, optional PQ mode)
-
Determinism switch:
--deterministic --seed <32b> --clock fake --fs-order stable.
Reachability engine (binary‑only) – minimal architecture:
- Loader: ELF/PE/Mach‑O parser; symbolizer; DWARF/PDB if present.
- IR lifter: (capstone/keystone‑style) → SSA/typed IL with conservatively modeled calls (PLT/IAT, vtables, GOT).
- CG/CFG builder: merges static edges + lightweight dynamic summaries (known stdlib shims); optional ML‑assisted indirect‑call resolution gated by proofs.
- Path search: bounded BFS/IDDFS from trusted entrypoints to vulnerable sinks; emits proof trees.
- Snippet builder: replays path with mocks for I/O; generates runnable harness or pseudo‑IR transcript.
Ground‑truth corpus (starter set):
- 20 binaries with injected sinks: 10 reachable, 10 unreachable, mixed obfuscation, stripped/unstripped, PIE/ASLR on/off, with/without CFI.
- Tag each sample with
sink_signature,expected_paths,expected_unreachable_reasons.
CI tasks (agents can implement now):
scanner.webservice:/bench/run→ runs corpus; exports metrics JSON + HTML.scheduler.webservice: nightly + per‑PR comparisons; fail gate if precision or deterministic‑replay dips > 1.0 pt vs baseline.notify.webservice: posts TTFRP trend + top regressions to PR.
Deterministic score proofs & Unknowns ranking
Goal: every risk score must be explainable and replayable. Unknowns shouldn’t be noisy; they should be transparently ranked.
Plain English:
- A score should read like a ledger: “Input X + Rule Y → +0.12 risk, because Z”. Unknowns are the “we don’t know yet” items—rank them by potential blast radius and thin evidence.
Signed proof‑trees (spec):
- Node types:
Input(SBOM/VEX/event),Transform(policy/lattice op),Delta(numeric change),Score. - Fields:
id,parentIds[],sha256,ruleId,evidenceRefs[],timestamp,actor(module),determinismSeed. - Encoding: CBOR/Flatbuffers; DSSE‑signed; top hash anchored to ledger (optional Rekor v2 mirror).
- Replayer:
stella score replay --bundle proofs/ --seed <seed>must output identical totals and per‑rule deltas.
Unknowns Registry & ranking:
-
Unknown = missing VEX, missing exploitability signal, ambiguous call edge, missing version provenance, or opaque packer.
-
Rank factors (weighted):
- Blast radius: transitive dependents, runtime privilege, exposure surface (net‑facing? in container PID 1?).
- Evidence scarcity: how many critical facts are missing?
- Exploit pressure: EPSS percentile (if available), KEV presence, chatter density (feeds).
- Containment signals: sandboxing, seccomp, read‑only FS, eBPF/LSM denies observed.
-
Output:
unknowns.score ∈ [0,1]+ proof path explaining the rank.
Quiet‑update UX (proof‑linked):
- Unknown cards are gated: collapsed by default; show top 3 reasons with “View proof”.
- As VEX/EPSS feeds refresh, the proof‑tree updates; the UI shows what changed and why (delta view).
Minimal schemas (drop‑in to Stella Ops)
# scoring/proof-tree.fbs (conceptual)
Table Node {
id:string; kind:enum{Input,Transform,Delta,Score};
parentIds:[string]; ruleId:string; sha256:string;
evidenceRefs:[string]; ts:ulong; actor:string;
delta:float; total:float; seed:[ubyte];
}
# unknowns/unknown-item.json
{
"id": "unk_…",
"artifactPurl": "pkg:…",
"reasons": ["missing_vex", "ambiguous_indirect_call"],
"blastRadius": { "dependents": 42, "privilege": "root", "netFacing": true },
"evidenceScarcity": 0.7,
"exploitPressure": { "epss": 0.83, "kev": false },
"containment": { "seccomp": "enforced", "fs": "ro" },
"score": 0.66,
"proofRef": "proofs/unk_…/tree.cbor"
}
Triggering & pipelines (existing services)
-
scanner.webservice
- Emits Proof Bundle + Unknowns for each image/binary.
- API:
POST /scan?deterministic=true&seed=…&emitProofs=true.
-
scheduled.webservice
- Periodic feed refresh (VEX/EPSS/KEV) → runs proof replayer; updates Unknowns ranks (no rescans).
-
notify.webservice
- Sends delta‑proof digests to PRs/Chat: “EPSS↑ from 0.41→0.58, Unknown score +0.06 (proof link)”.
-
concelier (feeds)
- Normalizes EPSS, KEV, vendor advisories; versioned with hashes in the Scan Manifest.
-
excititor (VEX aggregator)
- Produces explainable VEX merges: emits Transform nodes with ruleIds referencing lattice policies.
Developer guidelines (do this first)
- Add deterministic flags to all scanners and proof emitters (
--deterministic,--seed). - Implement Proof Bundle writer (Flatbuffers/CBOR + DSSE). Include per‑rule deltas and top hash.
- Create Ground‑Truth Corpus repo and CI job; publish precision/recall/TTFRP dashboards.
- Unknowns Registry micro‑model + ranking function; expose
/unknowns/list?sort=score. - Quiet‑update UI: Unknowns cards with “View proof”; delta badges when feeds change.
- Replay CLI:
stella score replay+stella proof verify(DSSE + hash match). - Audit doc: one‑pager “How to reproduce my score”—copy/paste commands from the manifest.
Tiny .NET 10 sketch (partial, compile‑ready)
public record ProofNode(
string Id, string Kind, string[] ParentIds, string RuleId,
string Sha256, string[] EvidenceRefs, DateTimeOffset Ts,
string Actor, double Delta, double Total, byte[] Seed);
public interface IScoreLedger {
void Append(ProofNode node);
double CurrentTotal { get; }
}
public sealed class DeterministicLedger : IScoreLedger {
private readonly List<ProofNode> _nodes = new();
private double _total;
public void Append(ProofNode n) {
// Deterministic ordering by (Ts, Id) already enforced upstream.
_total = n.Total; _nodes.Add(n);
}
public double CurrentTotal => _total;
}
public static class UnknownRanker {
public static double Rank(Blast b, double scarcity, double epss, bool kev, Containment c) {
var br = (Math.Min(b.Dependents/50.0,1.0) + (b.NetFacing?0.5:0) + (b.Privilege=="root"?0.5:0))/2.0;
var ep = Math.Min(epss + (kev?0.3:0), 1.0);
var ct = (c.Seccomp=="enforced"?-0.1:0) + (c.Fs=="ro"?-0.1:0);
return Math.Clamp(0.6*br + 0.3*scarcity + 0.3*ep + ct, 0, 1);
}
}
What you get if you ship this
- Trust‑on‑paper → trust‑in‑proofs: every score and “unknown” is backed by a tamper‑evident path.
- Noise control: Unknowns don’t spam—ranked, gated, and auto‑updated when new evidence arrives.
- Moat: reproducible evidence + runnable call‑stacks is hard to copy and easy to demo.
If you want, I can turn this into concrete tickets for scanner.webservice, excititor, concelier, notify, plus a first corpus seed and CI wiring.
What I described is two evidence upgrades that turn Stella Ops from “SBOM/VEX parity” into “provable, replayable security decisions”:
- Binary-only reachability proofs
- Deterministic score proofs + ranked Unknowns
Below is the purpose (why you want it) and a concrete implementation plan for Stella Ops (aligned with your rule: lattice algorithms run in scanner.webservice; Concelier/Excititor preserve prune source).
1) Binary-only reachability: purpose
Most scanners stop at: “this image contains libX version Y with CVE-Z”.
That creates noise because:
- The vulnerable function may be present but never callable from any real entrypoint.
- The vulnerability may be in a code path guarded by config, privilege, seccomp, or missing inputs.
Reachability answers the only question that matters operationally:
“Can execution reach the vulnerable sink from a real entry point in this container/app?”
What Stella should output for a “reachable” finding
- “Entry: nginx worker → module init → … → vulnerable function”
- A call path proof (graph + concrete nodes/addresses/symbols)
- Optional: a minimal repro harness/snippet or IR transcript
Why this is a moat
- It reduces false positives materially (and you can measure it).
- It produces auditor-friendly evidence (“show me the path”).
2) Deterministic score proofs + ranked Unknowns: purpose
Security teams distrust opaque scores. Auditors and regulated clients require repeatability.
Deterministic scoring proof means:
- Every score is a ledger of deltas (“+0.12 because EPSS=…, +0.18 because reachable path exists, −0.07 because seccomp enforced…”).
- The score can be replayed later and must match bit-for-bit given the same inputs (feeds, rules, policies, seed).
Unknowns are the “we don’t know yet” facts (missing VEX, ambiguous versions, unresolved indirect call edges). Instead of spamming, Stella ranks Unknowns by likely impact so DevOps sees the top 1–5 that actually matter.
Implementation plan for Stella Ops
Phase 0 — Lay the foundation (1 sprint)
Goal: make scans replayable and attach proofs to findings even before reachability is “perfect”.
0.1 Create a signed Scan Manifest (system-of-record in Postgres)
A manifest is a declarative capture of everything that affects results.
Store:
- artifact digest(s)
- tool versions (scanner workers + rule engine)
- Concelier snapshot hash(es) used
- Excititor snapshot hash(es) used
- lattice/policy digest (executed in
scanner.webservice) - deterministic flags + seed
- config knobs (depth limits, indirect-call resolution mode, etc.)
Deliverables
scan_manifesttable in Postgres- DSSE signature for the manifest
GET /scan/{id}/manifestendpoint
0.2 Proof Bundle format + storage
Store proof artifacts content-addressed (zip or directory) and reference them from findings.
Bundle contains
- callgraph subset (or placeholder graph in v0)
- score proof tree (CBOR/FlatBuffers)
- references to evidence inputs (SBOM/VEX/feeds digests)
Deliverables
proof_bundlemetadata table in Postgres (uri, root_hash, dsse_envelope)- filesystem/S3-compatible storage adapter
GET /scan/{id}/proofs/{findingId}endpoint
Phase 1 — Deterministic scoring + Unknowns (1–2 sprints)
Goal: every score becomes replayable; Unknowns become a controlled queue.
1.1 Score Proof Tree “ledger”
Implement a small internal library in .NET:
- pure functions: inputs → score + proof nodes
- nodes:
Input,Transform,Delta,Score - deterministic ordering and hashing
Deliverables
stella score replay --scan <id> --seed <seed>CLI (or internal job)POST /score/replayinscanner.webservice(recompute score without rescanning binaries)score_proofsstored in the Proof Bundle
1.2 Unknowns registry + ranking (computed in scanner.webservice)
Unknown reasons (examples):
- missing VEX for a CVE/component
- version provenance uncertain
- ambiguous indirect call edge for reachability
- packed/stripped binary blocking symbolization
Ranking model (deterministic)
- blast radius (dependents, privilege, net-facing)
- evidence scarcity (how many critical facts missing)
- exploit pressure (EPSS/KEV presence if available via Concelier snapshot)
- containment signals (seccomp/RO-fs observed)
Deliverables
unknownstable + APIGET /unknowns?sort=score- unknown proof tree (why it’s ranked #1)
- UI: Unknowns collapsed by default; top reasons + “view proof”
1.3 Feed refresh re-scores without rescans
Respect your architecture rule:
- Concelier/Excititor publish snapshots (preserve prune source)
scanner.webserviceruns lattice + scoring
Flow
- Scheduled detects a new Concelier/Excititor snapshot hash
- Scheduled calls
scanner.webservice /score/replayfor impacted scans - Notify emits “score delta” + proof link
Deliverables
scheduled.webservicejob: “rescore impacted scans”notify.webservicemessage template: “what changed + proof root hash”
Phase 2 — Binary reachability engine v1 (2–3 sprints)
Goal: ship a reachability proof that is useful today, then iterate fidelity.
2.1 v1 scope (pragmatic)
Start with:
- ELF (Linux containers) first
- imports/exports + PLT/GOT edges
- direct calls + conservative handling of indirect calls
- entrypoints:
main, exported functions, known framework entry hooks
What v1 outputs
- “reachable / not proven reachable / unknown”
- shortest path found (bounded depth)
- proof subgraph: nodes + edges + address ranges + symbol names if present
Deliverables
-
scanner.worker.binary(or binary module inside scanner worker) produces:- CFG/CG summary artifact
- per-finding path proof (if found)
-
TTFRP metric (time-to-first-reachable-path)
2.2 Proof format for reachability
For each finding:
callgraph.pb(or flatbuffers)path_0.ir(text SSA/IL transcript OR “disasm trace” v1)evidence.json(addresses, symbolization mode, loader metadata)
2.3 Ground-truth corpus + CI gates
Create a small repo of curated binaries with known reachable/unreachable sinks. Run nightly and per-PR.
Gates
- precision/recall must not regress
- deterministic replay must remain 100% on corpus
- TTFRP tracked (trend, not hard fail initially)
Deliverables
scanner.webservice /bench/run- scheduler nightly bench
- notify posts regressions in PR
Phase 3 — “Best in class” improvements (ongoing)
- Better indirect call resolution (vtables, function pointers) with proof constraints
- Stripped binary symbol recovery heuristics
- Optional snippet/harness generator (start as IR transcript, evolve to runnable)
- Multi-arch support (arm64) and PE/Mach-O if needed
Concrete service responsibilities (so your team doesn’t misplace logic)
Concelier (feeds)
- ingest EPSS/KEV/advisories
- produce versioned snapshots with hashes
- does not run lattice/scoring
Excititor (VEX)
- ingest vendor/openvex/csaf signals
- normalize and emit snapshots with hashes
- preserve prune source
- does not run lattice/scoring
Scanner.webservice
- orchestrates scanning workers
- runs lattice merge + scoring
- emits manifest + proofs
- exposes replay endpoints
Scheduled.webservice
- detects new snapshot hashes
- triggers replay/rescore jobs (no rescans)
- schedules benchmark runs
Notify.webservice
- delivers deltas with proof links (root hash + manifest hash)
Minimum “definition of done” checklist
You’ll know this is implemented when:
- Every scan stores a DSSE-signed Scan Manifest in Postgres.
- Every finding has a proofRef (score proof always; reachability proof where available).
score replayreproduces the same total and per-rule deltas given the manifest.- Unknowns are visible, ranked, and proof-linked, and updates from feeds change ranks via replay (not rescans).
- Corpus CI exists and prevents silent regressions.
If you want, I can translate this plan into:
- Postgres schema migrations (tables + indexes),
- REST endpoints for
scanner.webservice,scheduled.webservice,notify.webservice, - and a sprint-ready TASKS.md split per module with acceptance tests. Below is an “implementation extension” with (1) concrete goals/DoD and (2) major code snippets your developers can lift directly into Stella Ops (.NET 10/C#).
I’m keeping the architecture rule intact: Concelier + Excititor only emit snapshots (preserve prune source); scanner.webservice runs lattice/scoring and emits proofs. System of record is Postgres, with Valkey optional/ephemeral.
0) Concrete goals and Definition of Done
Phase A — Deterministic scan + Proof infrastructure (must ship first)
Goal A1 — Scan Manifest exists and is DSSE-signed
-
Every scan produces a
ScanManifestcontaining:- artifact digest(s) (image digest, file digest)
- scanner versions
- concelierSnapshotHash, excititorSnapshotHash
- lattice/policy hash (executed in
scanner.webservice) - deterministic flags + seed
- config knobs (depth limits, indirect-call resolution mode, etc.)
-
Manifest stored in Postgres and in the Proof Bundle.
-
Manifest DSSE signature verified by
stella proof verify.
Goal A2 — Proof Bundle exists for every scan
-
Proof bundle is content-addressed:
rootHash+ DSSE envelope stored. -
Bundle contains at minimum:
manifest.json(canonical)score_proof.cbor(or canonical JSON v1)evidence_refs.json(digests of inputs)
DoD
- Same scan inputs + same seed produce identical manifest hash and identical proof root hash.
Phase B — Deterministic scoring ledger + replay
Goal B1 — Scoring is a pure function
Score = f(Manifest, Findings, FeedSnapshot, VEXSnapshot, RuntimeSignals?, Seed)- Every numeric change is recorded as a proof node (
Delta) with evidence references.
Goal B2 — Replay
POST /score/replayrecomputes scores from manifest + snapshot hashes without rescanning binaries.- Replay output (proof root hash + totals) is identical across runs.
DoD
- Replay for a prior scan must reproduce bit-identical proof output (hash match).
Phase C — Unknowns registry + deterministic ranking
Goal C1 — Unknowns are first-class
-
Unknown item emitted when evidence is missing or ambiguous:
- missing VEX, ambiguous component version, unresolved indirect-call edge, packed binary, etc.
-
Unknowns ranked deterministically with a proof trail.
DoD
- UI shows top-ranked Unknowns collapsed by default; every Unknown has “View proof”.
Phase D — Binary-only reachability v1 (useful quickly)
Goal D1 — Reachability classification
- Each vulnerable sink gets:
Reachable | NotProvenReachable | Unknown - When reachable, emit a shortest path proof (bounded BFS) from entrypoint.
Goal D2 — TTFRP metric
- Emit TTFRP and store per scan.
DoD
- Corpus benchmark job runs nightly and tracks precision/recall + TTFRP trends.
1) Core data models (Manifest, Proof Nodes, Unknowns)
1.1 ScanManifest (canonical JSON for hashing)
public sealed record ScanManifest(
string ScanId,
DateTimeOffset CreatedAtUtc,
string ArtifactDigest, // sha256:... or image digest
string ArtifactPurl, // optional
string ScannerVersion, // scanner.webservice version
string WorkerVersion, // scanner.worker.* version
string ConcelierSnapshotHash, // immutable feed snapshot digest
string ExcititorSnapshotHash, // immutable vex snapshot digest
string LatticePolicyHash, // policy bundle digest
bool Deterministic,
byte[] Seed, // 32 bytes
IReadOnlyDictionary<string,string> Knobs // depth limits etc.
);
1.2 ProofNode (ledger entries)
public enum ProofNodeKind { Input, Transform, Delta, Score }
public sealed record ProofNode(
string Id,
ProofNodeKind Kind,
string RuleId,
string[] ParentIds,
string[] EvidenceRefs, // digests / refs inside bundle
double Delta, // 0 for non-Delta nodes
double Total, // running total at this node
string Actor, // module name
DateTimeOffset TsUtc,
byte[] Seed,
string NodeHash // sha256 over canonical node (excluding NodeHash)
);
1.3 UnknownItem
public sealed record UnknownItem(
string Id,
string ArtifactDigest,
string ArtifactPurl,
string[] Reasons,
BlastRadius BlastRadius,
double EvidenceScarcity,
ExploitPressure ExploitPressure,
ContainmentSignals Containment,
double Score, // 0..1
string ProofRef // path inside proof bundle
);
public sealed record BlastRadius(int Dependents, bool NetFacing, string Privilege); // "root"/"user"
public sealed record ExploitPressure(double? Epss, bool Kev);
public sealed record ContainmentSignals(string Seccomp, string Fs); // "enforced"/"none", "ro"/"rw"
2) Canonical JSON + hashing (determinism foundation)
2.1 Canonicalize JSON (sort object keys recursively)
using System.Security.Cryptography;
using System.Text;
using System.Text.Json;
public static class CanonJson
{
public static byte[] Canonicalize<T>(T obj)
{
var json = JsonSerializer.SerializeToUtf8Bytes(obj, new JsonSerializerOptions
{
WriteIndented = false,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
});
using var doc = JsonDocument.Parse(json);
using var ms = new MemoryStream();
using var writer = new Utf8JsonWriter(ms, new JsonWriterOptions { Indented = false });
WriteElementSorted(doc.RootElement, writer);
writer.Flush();
return ms.ToArray();
}
private static void WriteElementSorted(JsonElement el, Utf8JsonWriter w)
{
switch (el.ValueKind)
{
case JsonValueKind.Object:
w.WriteStartObject();
foreach (var prop in el.EnumerateObject().OrderBy(p => p.Name, StringComparer.Ordinal))
{
w.WritePropertyName(prop.Name);
WriteElementSorted(prop.Value, w);
}
w.WriteEndObject();
break;
case JsonValueKind.Array:
w.WriteStartArray();
foreach (var item in el.EnumerateArray())
WriteElementSorted(item, w);
w.WriteEndArray();
break;
default:
el.WriteTo(w);
break;
}
}
public static string Sha256Hex(ReadOnlySpan<byte> bytes)
=> Convert.ToHexString(SHA256.HashData(bytes)).ToLowerInvariant();
}
3) DSSE envelope (sign manifests and proof roots)
3.1 DSSE types + signer abstraction
public sealed record DsseEnvelope(
string PayloadType,
string Payload, // base64
DsseSignature[] Signatures
);
public sealed record DsseSignature(string KeyId, string Sig); // base64 sig
public interface IContentSigner
{
string KeyId { get; }
byte[] Sign(ReadOnlySpan<byte> message);
bool Verify(ReadOnlySpan<byte> message, ReadOnlySpan<byte> signature);
}
3.2 DSSE build (DSSE preauth encoding)
using System.Text;
public static class Dsse
{
// DSSE PAE:
// PAE("DSSEv1", payloadType, payload)
public static byte[] PAE(string payloadType, ReadOnlySpan<byte> payload)
{
static byte[] Len(byte[] b) => Encoding.UTF8.GetBytes(b.Length.ToString());
var pt = Encoding.UTF8.GetBytes(payloadType);
var dsse = Encoding.UTF8.GetBytes("DSSEv1");
using var ms = new MemoryStream();
void WritePart(byte[] part)
{
ms.Write(Len(part));
ms.WriteByte((byte)' ');
ms.Write(part);
ms.WriteByte((byte)' ');
}
WritePart(dsse);
WritePart(pt);
ms.Write(Len(payload.ToArray()));
ms.WriteByte((byte)' ');
ms.Write(payload);
return ms.ToArray();
}
public static DsseEnvelope SignJson<T>(string payloadType, T payloadObj, IContentSigner signer)
{
var payload = CanonJson.Canonicalize(payloadObj);
var pae = PAE(payloadType, payload);
var sig = signer.Sign(pae);
return new DsseEnvelope(
payloadType,
Convert.ToBase64String(payload),
new[] { new DsseSignature(signer.KeyId, Convert.ToBase64String(sig)) }
);
}
}
3.3 ECDSA P-256 signer (portable default)
using System.Security.Cryptography;
public sealed class EcdsaP256Signer : IContentSigner, IDisposable
{
private readonly ECDsa _ecdsa;
public string KeyId { get; }
public EcdsaP256Signer(string keyId, ECDsa ecdsa)
{
KeyId = keyId;
_ecdsa = ecdsa;
}
public byte[] Sign(ReadOnlySpan<byte> message)
=> _ecdsa.SignData(message.ToArray(), HashAlgorithmName.SHA256);
public bool Verify(ReadOnlySpan<byte> message, ReadOnlySpan<byte> signature)
=> _ecdsa.VerifyData(message.ToArray(), signature.ToArray(), HashAlgorithmName.SHA256);
public void Dispose() => _ecdsa.Dispose();
}
4) Proof ledger: append nodes, compute node hashes, compute root hash
4.1 Node hashing (exclude NodeHash itself)
public static class ProofHashing
{
public static ProofNode WithHash(ProofNode n)
{
var canonical = CanonJson.Canonicalize(new
{
n.Id, n.Kind, n.RuleId, n.ParentIds, n.EvidenceRefs, n.Delta, n.Total,
n.Actor, n.TsUtc, Seed = Convert.ToBase64String(n.Seed)
});
return n with { NodeHash = "sha256:" + CanonJson.Sha256Hex(canonical) };
}
public static string ComputeRootHash(IEnumerable<ProofNode> nodesInOrder)
{
// Deterministic: root hash over canonical JSON array of node hashes in order.
var arr = nodesInOrder.Select(n => n.NodeHash).ToArray();
var bytes = CanonJson.Canonicalize(arr);
return "sha256:" + CanonJson.Sha256Hex(bytes);
}
}
4.2 Minimal ledger (deterministic ordering enforced by append order)
public sealed class ProofLedger
{
private readonly List<ProofNode> _nodes = new();
public IReadOnlyList<ProofNode> Nodes => _nodes;
public void Append(ProofNode node)
{
_nodes.Add(ProofHashing.WithHash(node));
}
public string RootHash() => ProofHashing.ComputeRootHash(_nodes);
}
5) Deterministic scoring function (with proof nodes)
5.1 Example scoring pipeline (CVSS + EPSS + reachability + containment)
public sealed record ScoreInputs(
double CvssBase, // 0..10
double? Epss, // 0..1
bool Kev,
ReachabilityClass Reachability, // Reachable/NotProven/Unknown
ContainmentSignals Containment
);
public enum ReachabilityClass { Reachable, NotProvenReachable, Unknown }
public static class RiskScoring
{
public static (double Score01, ProofLedger Ledger) Score(
ScoreInputs input,
string scanId,
byte[] seed,
DateTimeOffset tsUtc)
{
var ledger = new ProofLedger();
var total = 0.0;
// Input node
ledger.Append(new ProofNode(
Id: $"in:{scanId}",
Kind: ProofNodeKind.Input,
RuleId: "inputs.v1",
ParentIds: Array.Empty<string>(),
EvidenceRefs: Array.Empty<string>(),
Delta: 0,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
// CVSS base mapping
var cvss01 = Math.Clamp(input.CvssBase / 10.0, 0, 1);
total += 0.55 * cvss01;
ledger.Append(new ProofNode(
Id: $"d:cvss:{scanId}",
Kind: ProofNodeKind.Delta,
RuleId: "score.cvss_base.weighted",
ParentIds: new[] { $"in:{scanId}" },
EvidenceRefs: new[] { $"cvss:{input.CvssBase:0.0}" },
Delta: 0.55 * cvss01,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
// EPSS (optional)
if (input.Epss is { } epss)
{
total += 0.25 * Math.Clamp(epss, 0, 1);
ledger.Append(new ProofNode(
Id: $"d:epss:{scanId}",
Kind: ProofNodeKind.Delta,
RuleId: "score.epss.weighted",
ParentIds: new[] { $"d:cvss:{scanId}" },
EvidenceRefs: new[] { $"epss:{epss:0.0000}" },
Delta: 0.25 * epss,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
}
// KEV boosts urgency
if (input.Kev)
{
total += 0.15;
ledger.Append(new ProofNode(
Id: $"d:kev:{scanId}",
Kind: ProofNodeKind.Delta,
RuleId: "score.kev.bump",
ParentIds: new[] { $"d:cvss:{scanId}" },
EvidenceRefs: new[] { "kev:true" },
Delta: 0.15,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
}
// Reachability
var reachDelta = input.Reachability switch
{
ReachabilityClass.Reachable => 0.20,
ReachabilityClass.NotProvenReachable => 0.00,
ReachabilityClass.Unknown => 0.08, // unknown still adds risk, but less than proven reachable
_ => 0.00
};
total += reachDelta;
ledger.Append(new ProofNode(
Id: $"d:reach:{scanId}",
Kind: ProofNodeKind.Delta,
RuleId: "score.reachability",
ParentIds: new[] { $"d:cvss:{scanId}" },
EvidenceRefs: new[] { $"reach:{input.Reachability}" },
Delta: reachDelta,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
// Containment deductions (examples)
var containmentDelta = 0.0;
if (string.Equals(input.Containment.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase))
containmentDelta -= 0.05;
if (string.Equals(input.Containment.Fs, "ro", StringComparison.OrdinalIgnoreCase))
containmentDelta -= 0.03;
total = Math.Clamp(total + containmentDelta, 0, 1);
ledger.Append(new ProofNode(
Id: $"d:contain:{scanId}",
Kind: ProofNodeKind.Delta,
RuleId: "score.containment",
ParentIds: new[] { $"d:reach:{scanId}" },
EvidenceRefs: new[] { $"seccomp:{input.Containment.Seccomp}", $"fs:{input.Containment.Fs}" },
Delta: containmentDelta,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
// Final score node
ledger.Append(new ProofNode(
Id: $"s:{scanId}",
Kind: ProofNodeKind.Score,
RuleId: "score.final",
ParentIds: new[] { $"d:contain:{scanId}" },
EvidenceRefs: new[] { "root" },
Delta: 0,
Total: total,
Actor: "scanner.webservice",
TsUtc: tsUtc,
Seed: seed,
NodeHash: ""
));
return (total, ledger);
}
}
6) Unknown ranking (deterministic) + proof
6.1 Ranking function
public static class UnknownRanker
{
public static double Rank(BlastRadius b, double scarcity, ExploitPressure ep, ContainmentSignals c)
{
var dependents01 = Math.Clamp(b.Dependents / 50.0, 0, 1);
var net = b.NetFacing ? 0.5 : 0.0;
var priv = string.Equals(b.Privilege, "root", StringComparison.OrdinalIgnoreCase) ? 0.5 : 0.0;
var blast = Math.Clamp((dependents01 + net + priv) / 2.0, 0, 1);
var epss01 = ep.Epss is null ? 0.35 : Math.Clamp(ep.Epss.Value, 0, 1); // default mild pressure
var kev = ep.Kev ? 0.30 : 0.0;
var pressure = Math.Clamp(epss01 + kev, 0, 1);
var containment = 0.0;
if (string.Equals(c.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase)) containment -= 0.10;
if (string.Equals(c.Fs, "ro", StringComparison.OrdinalIgnoreCase)) containment -= 0.10;
return Math.Clamp(0.60 * blast + 0.30 * scarcity + 0.30 * pressure + containment, 0, 1);
}
}
6.2 Unknown proof node pattern
When you compute Unknown rank, emit a mini ledger identical to score proofs:
- Input node: reasons + evidence scarcity facts
- Delta nodes: blast/pressure/containment components
- Score node: final unknown score
Store it in
proofs/unknowns/{unkId}/tree.json.
7) Proof Bundle writer (zip + root hash + DSSE)
using System.IO.Compression;
public sealed class ProofBundleWriter
{
public static async Task<(string RootHash, string BundlePath)> WriteAsync(
string baseDir,
ScanManifest manifest,
ProofLedger scoreLedger,
DsseEnvelope manifestDsse,
IContentSigner signer,
CancellationToken ct)
{
Directory.CreateDirectory(baseDir);
var manifestBytes = CanonJson.Canonicalize(manifest);
var ledgerBytes = CanonJson.Canonicalize(scoreLedger.Nodes); // v1 JSON; swap to CBOR later
// Root hash covers canonical content (manifest + ledger)
var rootMaterial = CanonJson.Canonicalize(new
{
manifest = "sha256:" + CanonJson.Sha256Hex(manifestBytes),
scoreProof = "sha256:" + CanonJson.Sha256Hex(ledgerBytes),
scoreRoot = scoreLedger.RootHash()
});
var rootHash = "sha256:" + CanonJson.Sha256Hex(rootMaterial);
// DSSE sign the root descriptor
var rootDsse = Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new
{
rootHash,
scoreRoot = scoreLedger.RootHash()
}, signer);
var bundleName = $"{manifest.ScanId}_{rootHash.Replace("sha256:", "")}.zip";
var bundlePath = Path.Combine(baseDir, bundleName);
await using var fs = File.Create(bundlePath);
using var zip = new ZipArchive(fs, ZipArchiveMode.Create, leaveOpen: false);
void Add(string name, byte[] content)
{
var e = zip.CreateEntry(name, CompressionLevel.Optimal);
using var s = e.Open();
s.Write(content, 0, content.Length);
}
Add("manifest.json", manifestBytes);
Add("manifest.dsse.json", CanonJson.Canonicalize(manifestDsse));
Add("score_proof.json", ledgerBytes);
Add("proof_root.dsse.json", CanonJson.Canonicalize(rootDsse));
Add("meta.json", CanonJson.Canonicalize(new { rootHash, createdAtUtc = DateTimeOffset.UtcNow }));
return (rootHash, bundlePath);
}
}
8) Postgres schema (authoritative) and EF Core skeleton
8.1 Tables (SQL snippet)
create table scan_manifest (
scan_id text primary key,
created_at_utc timestamptz not null,
artifact_digest text not null,
concelier_snapshot_hash text not null,
excititor_snapshot_hash text not null,
lattice_policy_hash text not null,
deterministic boolean not null,
seed bytea not null,
manifest_json jsonb not null,
manifest_dsse_json jsonb not null,
manifest_hash text not null
);
create table proof_bundle (
scan_id text not null references scan_manifest(scan_id),
root_hash text not null,
bundle_uri text not null,
proof_root_dsse_json jsonb not null,
created_at_utc timestamptz not null,
primary key (scan_id, root_hash)
);
create index ix_scan_manifest_artifact on scan_manifest(artifact_digest);
create index ix_scan_manifest_snapshots on scan_manifest(concelier_snapshot_hash, excititor_snapshot_hash);
8.2 EF Core entities (minimal)
public sealed class ScannerDbContext : DbContext
{
public DbSet<ScanManifestRow> ScanManifests => Set<ScanManifestRow>();
public DbSet<ProofBundleRow> ProofBundles => Set<ProofBundleRow>();
public ScannerDbContext(DbContextOptions<ScannerDbContext> options) : base(options) { }
protected override void OnModelCreating(ModelBuilder b)
{
b.Entity<ScanManifestRow>().HasKey(x => x.ScanId);
b.Entity<ProofBundleRow>().HasKey(x => new { x.ScanId, x.RootHash });
b.Entity<ScanManifestRow>().HasIndex(x => x.ArtifactDigest);
}
}
public sealed class ScanManifestRow
{
public string ScanId { get; set; } = default!;
public DateTimeOffset CreatedAtUtc { get; set; }
public string ArtifactDigest { get; set; } = default!;
public string ConcelierSnapshotHash { get; set; } = default!;
public string ExcititorSnapshotHash { get; set; } = default!;
public string LatticePolicyHash { get; set; } = default!;
public bool Deterministic { get; set; }
public byte[] Seed { get; set; } = default!;
public string ManifestHash { get; set; } = default!;
public string ManifestJson { get; set; } = default!; // store canonical JSON string
public string ManifestDsseJson { get; set; } = default!;
}
public sealed class ProofBundleRow
{
public string ScanId { get; set; } = default!;
public string RootHash { get; set; } = default!;
public string BundleUri { get; set; } = default!;
public DateTimeOffset CreatedAtUtc { get; set; }
public string ProofRootDsseJson { get; set; } = default!;
}
9) scanner.webservice endpoints (minimal APIs)
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.EntityFrameworkCore;
var app = WebApplication.CreateBuilder(args)
.AddServices()
.Build();
app.MapPost("/scan", async (ScanRequest req, ScannerDbContext db, CancellationToken ct) =>
{
var scanId = Guid.NewGuid().ToString("n");
var seed = req.Seed ?? RandomNumberGenerator.GetBytes(32);
var created = DateTimeOffset.UtcNow;
// Snapshot hashes come from your snapshot selector (by policy/environment)
var concelierHash = req.ConcelierSnapshotHash;
var excititorHash = req.ExcititorSnapshotHash;
var manifest = new ScanManifest(
ScanId: scanId,
CreatedAtUtc: created,
ArtifactDigest: req.ArtifactDigest,
ArtifactPurl: req.ArtifactPurl ?? "",
ScannerVersion: req.ScannerVersion,
WorkerVersion: req.WorkerVersion,
ConcelierSnapshotHash: concelierHash,
ExcititorSnapshotHash: excititorHash,
LatticePolicyHash: req.LatticePolicyHash,
Deterministic: req.Deterministic,
Seed: seed,
Knobs: req.Knobs ?? new Dictionary<string,string>()
);
var manifestHash = "sha256:" + CanonJson.Sha256Hex(CanonJson.Canonicalize(manifest));
// Sign DSSE
using var signer = YourSignerFactory.Create(); // ECDSA or other profile
var dsse = Dsse.SignJson("application/vnd.stellaops.scan-manifest.v1+json", manifest, signer);
db.ScanManifests.Add(new ScanManifestRow
{
ScanId = scanId,
CreatedAtUtc = created,
ArtifactDigest = req.ArtifactDigest,
ConcelierSnapshotHash = concelierHash,
ExcititorSnapshotHash = excititorHash,
LatticePolicyHash = req.LatticePolicyHash,
Deterministic = req.Deterministic,
Seed = seed,
ManifestHash = manifestHash,
ManifestJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(manifest)),
ManifestDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(dsse))
});
await db.SaveChangesAsync(ct);
return Results.Ok(new { scanId, manifestHash });
});
app.MapGet("/scan/{scanId}/manifest", async (string scanId, ScannerDbContext db, CancellationToken ct) =>
{
var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct);
return Results.Text(row.ManifestJson, "application/json");
});
app.MapPost("/scan/{scanId}/score/replay", async (string scanId, ScannerDbContext db, CancellationToken ct) =>
{
var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct);
var manifest = JsonSerializer.Deserialize<ScanManifest>(row.ManifestJson)!;
// Load findings + snapshots by hash (your repositories)
var inputs = new ScoreInputs(
CvssBase: 9.1,
Epss: 0.62,
Kev: false,
Reachability: ReachabilityClass.Unknown,
Containment: new ContainmentSignals("enforced", "ro")
);
var (score, ledger) = RiskScoring.Score(inputs, scanId, manifest.Seed, DateTimeOffset.UtcNow);
using var signer = YourSignerFactory.Create();
var (rootHash, bundlePath) = await ProofBundleWriter.WriteAsync(
baseDir: "/var/lib/stellaops/proofs",
manifest: manifest,
scoreLedger: ledger,
manifestDsse: JsonSerializer.Deserialize<DsseEnvelope>(row.ManifestDsseJson)!,
signer: signer,
ct: ct);
db.ProofBundles.Add(new ProofBundleRow
{
ScanId = scanId,
RootHash = rootHash,
BundleUri = bundlePath,
CreatedAtUtc = DateTimeOffset.UtcNow,
ProofRootDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(
Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new { rootHash }, signer)))
});
await db.SaveChangesAsync(ct);
return Results.Ok(new { score, rootHash, bundleUri = bundlePath });
});
app.Run();
public sealed record ScanRequest(
string ArtifactDigest,
string? ArtifactPurl,
string ScannerVersion,
string WorkerVersion,
string ConcelierSnapshotHash,
string ExcititorSnapshotHash,
string LatticePolicyHash,
bool Deterministic,
byte[]? Seed,
Dictionary<string,string>? Knobs
);
10) Binary reachability v1: major skeleton (bounded BFS over a naive callgraph)
This is intentionally “v1”: direct calls + imports + conservative unknowns. It still delivers value fast.
public sealed record FuncNode(ulong Address, string Name);
public sealed record CallEdge(ulong From, ulong To, string Kind); // "direct"/"import"/"indirect"
public sealed class CallGraph
{
public Dictionary<ulong, FuncNode> Nodes { get; } = new();
public List<CallEdge> Edges { get; } = new();
public IEnumerable<ulong> Neighbors(ulong from)
=> Edges.Where(e => e.From == from).Select(e => e.To);
}
public static class Reachability
{
public static (ReachabilityClass Class, ulong[]? Path) FindPath(
CallGraph cg,
IEnumerable<ulong> entrypoints,
ulong sink,
int maxDepth)
{
var visited = new HashSet<ulong>();
var parent = new Dictionary<ulong, ulong>();
var q = new Queue<(ulong node, int depth)>();
foreach (var ep in entrypoints)
{
q.Enqueue((ep, 0));
visited.Add(ep);
}
while (q.Count > 0)
{
var (cur, depth) = q.Dequeue();
if (cur == sink)
return (ReachabilityClass.Reachable, Reconstruct(parent, cur));
if (depth >= maxDepth) continue;
foreach (var nxt in cg.Neighbors(cur))
{
if (visited.Add(nxt))
{
parent[nxt] = cur;
q.Enqueue((nxt, depth + 1));
}
}
}
return (ReachabilityClass.NotProvenReachable, null);
}
private static ulong[] Reconstruct(Dictionary<ulong, ulong> parent, ulong end)
{
var path = new List<ulong> { end };
while (parent.TryGetValue(end, out var p))
{
path.Add(p);
end = p;
}
path.Reverse();
return path.ToArray();
}
}
Proof emission for reachability
-
Store:
callgraph.json(nodes + edges subset relevant to this sink)path_0.json(address chain + symbol names)
-
Create a scoring delta node referencing
reach:path_0.jsonwhen reachable.
11) Determinism test (xUnit “hash must match”)
public class DeterminismTests
{
[Fact]
public void Score_Replay_IsBitIdentical()
{
var seed = Enumerable.Repeat((byte)7, 32).ToArray();
var inputs = new ScoreInputs(9.0, 0.50, false, ReachabilityClass.Unknown, new("enforced","ro"));
var (s1, l1) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z"));
var (s2, l2) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z"));
Assert.Equal(s1, s2, 10);
Assert.Equal(l1.RootHash(), l2.RootHash());
Assert.True(l1.Nodes.Zip(l2.Nodes).All(z => z.First.NodeHash == z.Second.NodeHash));
}
}
12) What developers should implement next (priority order)
- Canonical JSON + hashing (Phase A prerequisite)
- Manifest + DSSE signing + Postgres persistence
- Proof ledger + root hash + Proof Bundle writer
- Replay endpoint (
/score/replay) and scheduler hook to rescore on new snapshot hashes - Unknown registry + deterministic ranking + proof
- Reachability v1 (callgraph + bounded BFS + proof emission)
- Corpus bench and CI regression gates
If you want, I can convert this into repo-ready TASKS.md blocks per module (scanner.webservice, scheduled.webservice, notify.webservice) with acceptance tests and a minimal migration set aligned to your existing naming conventions.