Files

master 3a2100aa78 Add unit and integration tests for VexCandidateEmitter and SmartDiff repositories

- Implemented comprehensive unit tests for VexCandidateEmitter to validate candidate emission logic based on various scenarios including absent and present APIs, confidence thresholds, and rate limiting.
- Added integration tests for SmartDiff PostgreSQL repositories, covering snapshot storage and retrieval, candidate storage, and material risk change handling.
- Ensured tests validate correct behavior for storing, retrieving, and querying snapshots and candidates, including edge cases and expected outcomes.

2025-12-16 19:00:43 +02:00

46 KiB

Raw Blame History

Here’s a compact playbook for making Stella Ops stand out on binary‑only analysis quality and deterministic, explainable scoring—from concepts to dev‑ready specs.

Binary‑only analysis & call‑graph fidelity

Goal: prove we reach the right code, not just flag files.

Why it matters (plain English):

Many scanners “see” a CVE but can’t show how execution reaches it. You need proof you can actually hit the bad function from an app entrypoint.

North‑star metrics (automate in CI):

Precision / Recall vs a small ground‑truth corpus (curated samples with known reachable/unreachable sinks).
TTFRP (Time‑to‑First‑Reachable‑Path): ms from analyzer start to first valid call‑path.
Runnable call‑stack snippets %: fraction of findings that include a minimal, compilable snippet (or pseudo‑IR) reproducing the call chain.
Deterministic replay %: identical proofs (hash‑equal) across OS/CPU/container.

Reproducible‑run contract:

Scan Manifest (DSSE‑signed): inputs, toolchain versions, lattice policies, feed hashes, CFG/CG build params, symbolization mode, and hash of the “proof‑builder”.
Proof Bundle:
- /proofs/{findingId}/callgraph.pb (protobuf/flatbuffers)
- /proofs/{findingId}/path_0.ir (SSA/IL)
- /proofs/{findingId}/snippet_0/ (repro harness)
- /attestations/ (rekor‑ready, optional PQ mode)
Determinism switch: --deterministic --seed <32b> --clock fake --fs-order stable.

Reachability engine (binary‑only) – minimal architecture:

Loader: ELF/PE/Mach‑O parser; symbolizer; DWARF/PDB if present.
IR lifter: (capstone/keystone‑style) → SSA/typed IL with conservatively modeled calls (PLT/IAT, vtables, GOT).
CG/CFG builder: merges static edges + lightweight dynamic summaries (known stdlib shims); optional ML‑assisted indirect‑call resolution gated by proofs.
Path search: bounded BFS/IDDFS from trusted entrypoints to vulnerable sinks; emits proof trees.
Snippet builder: replays path with mocks for I/O; generates runnable harness or pseudo‑IR transcript.

Ground‑truth corpus (starter set):

20 binaries with injected sinks: 10 reachable, 10 unreachable, mixed obfuscation, stripped/unstripped, PIE/ASLR on/off, with/without CFI.
Tag each sample with sink_signature, expected_paths, expected_unreachable_reasons.

CI tasks (agents can implement now):

scanner.webservice: /bench/run → runs corpus; exports metrics JSON + HTML.
scheduler.webservice: nightly + per‑PR comparisons; fail gate if precision or deterministic‑replay dips > 1.0 pt vs baseline.
notify.webservice: posts TTFRP trend + top regressions to PR.

Deterministic score proofs & Unknowns ranking

Goal: every risk score must be explainable and replayable. Unknowns shouldn’t be noisy; they should be transparently ranked.

Plain English:

A score should read like a ledger: “Input X + Rule Y → +0.12 risk, because Z”. Unknowns are the “we don’t know yet” items—rank them by potential blast radius and thin evidence.

Signed proof‑trees (spec):

Node types: Input (SBOM/VEX/event), Transform (policy/lattice op), Delta (numeric change), Score.
Fields: id, parentIds[], sha256, ruleId, evidenceRefs[], timestamp, actor (module), determinismSeed.
Encoding: CBOR/Flatbuffers; DSSE‑signed; top hash anchored to ledger (optional Rekor v2 mirror).
Replayer: stella score replay --bundle proofs/ --seed <seed> must output identical totals and per‑rule deltas.

Unknowns Registry & ranking:

Unknown = missing VEX, missing exploitability signal, ambiguous call edge, missing version provenance, or opaque packer.
Rank factors (weighted):
- Blast radius: transitive dependents, runtime privilege, exposure surface (net‑facing? in container PID 1?).
- Evidence scarcity: how many critical facts are missing?
- Exploit pressure: EPSS percentile (if available), KEV presence, chatter density (feeds).
- Containment signals: sandboxing, seccomp, read‑only FS, eBPF/LSM denies observed.
Output: unknowns.score ∈ [0,1] + proof path explaining the rank.

Quiet‑update UX (proof‑linked):

Unknown cards are gated: collapsed by default; show top 3 reasons with “View proof”.
As VEX/EPSS feeds refresh, the proof‑tree updates; the UI shows what changed and why (delta view).

Minimal schemas (drop‑in to Stella Ops)

# scoring/proof-tree.fbs (conceptual)
Table Node {
  id:string; kind:enum{Input,Transform,Delta,Score};
  parentIds:[string]; ruleId:string; sha256:string;
  evidenceRefs:[string]; ts:ulong; actor:string;
  delta:float; total:float; seed:[ubyte];
}

# unknowns/unknown-item.json
{
  "id": "unk_…",
  "artifactPurl": "pkg:…",
  "reasons": ["missing_vex", "ambiguous_indirect_call"],
  "blastRadius": { "dependents": 42, "privilege": "root", "netFacing": true },
  "evidenceScarcity": 0.7,
  "exploitPressure": { "epss": 0.83, "kev": false },
  "containment": { "seccomp": "enforced", "fs": "ro" },
  "score": 0.66,
  "proofRef": "proofs/unk_…/tree.cbor"
}

Triggering & pipelines (existing services)

scanner.webservice
- Emits Proof Bundle + Unknowns for each image/binary.
- API: POST /scan?deterministic=true&seed=…&emitProofs=true.
scheduled.webservice
- Periodic feed refresh (VEX/EPSS/KEV) → runs proof replayer; updates Unknowns ranks (no rescans).
notify.webservice
- Sends delta‑proof digests to PRs/Chat: “EPSS↑ from 0.41→0.58, Unknown score +0.06 (proof link)”.
concelier (feeds)
- Normalizes EPSS, KEV, vendor advisories; versioned with hashes in the Scan Manifest.
excititor (VEX aggregator)
- Produces explainable VEX merges: emits Transform nodes with ruleIds referencing lattice policies.

Developer guidelines (do this first)

Add deterministic flags to all scanners and proof emitters (--deterministic, --seed).
Implement Proof Bundle writer (Flatbuffers/CBOR + DSSE). Include per‑rule deltas and top hash.
Create Ground‑Truth Corpus repo and CI job; publish precision/recall/TTFRP dashboards.
Unknowns Registry micro‑model + ranking function; expose /unknowns/list?sort=score.
Quiet‑update UI: Unknowns cards with “View proof”; delta badges when feeds change.
Replay CLI: stella score replay + stella proof verify (DSSE + hash match).
Audit doc: one‑pager “How to reproduce my score”—copy/paste commands from the manifest.

Tiny .NET 10 sketch (partial, compile‑ready)

public record ProofNode(
  string Id, string Kind, string[] ParentIds, string RuleId,
  string Sha256, string[] EvidenceRefs, DateTimeOffset Ts,
  string Actor, double Delta, double Total, byte[] Seed);

public interface IScoreLedger {
  void Append(ProofNode node);
  double CurrentTotal { get; }
}

public sealed class DeterministicLedger : IScoreLedger {
  private readonly List<ProofNode> _nodes = new();
  private double _total;
  public void Append(ProofNode n) {
    // Deterministic ordering by (Ts, Id) already enforced upstream.
    _total = n.Total; _nodes.Add(n);
  }
  public double CurrentTotal => _total;
}

public static class UnknownRanker {
  public static double Rank(Blast b, double scarcity, double epss, bool kev, Containment c) {
    var br = (Math.Min(b.Dependents/50.0,1.0) + (b.NetFacing?0.5:0) + (b.Privilege=="root"?0.5:0))/2.0;
    var ep = Math.Min(epss + (kev?0.3:0), 1.0);
    var ct = (c.Seccomp=="enforced"?-0.1:0) + (c.Fs=="ro"?-0.1:0);
    return Math.Clamp(0.6*br + 0.3*scarcity + 0.3*ep + ct, 0, 1);
  }
}

What you get if you ship this

Trust‑on‑paper → trust‑in‑proofs: every score and “unknown” is backed by a tamper‑evident path.
Noise control: Unknowns don’t spam—ranked, gated, and auto‑updated when new evidence arrives.
Moat: reproducible evidence + runnable call‑stacks is hard to copy and easy to demo.

If you want, I can turn this into concrete tickets for scanner.webservice, excititor, concelier, notify, plus a first corpus seed and CI wiring. What I described is two evidence upgrades that turn Stella Ops from “SBOM/VEX parity” into “provable, replayable security decisions”:

Binary-only reachability proofs
Deterministic score proofs + ranked Unknowns

Below is the purpose (why you want it) and a concrete implementation plan for Stella Ops (aligned with your rule: lattice algorithms run in scanner.webservice; Concelier/Excititor preserve prune source).

1) Binary-only reachability: purpose

Most scanners stop at: “this image contains libX version Y with CVE-Z”.

That creates noise because:

The vulnerable function may be present but never callable from any real entrypoint.
The vulnerability may be in a code path guarded by config, privilege, seccomp, or missing inputs.

Reachability answers the only question that matters operationally:

“Can execution reach the vulnerable sink from a real entry point in this container/app?”

What Stella should output for a “reachable” finding

“Entry: nginx worker → module init → … → vulnerable function”
A call path proof (graph + concrete nodes/addresses/symbols)
Optional: a minimal repro harness/snippet or IR transcript

Why this is a moat

It reduces false positives materially (and you can measure it).
It produces auditor-friendly evidence (“show me the path”).

2) Deterministic score proofs + ranked Unknowns: purpose

Security teams distrust opaque scores. Auditors and regulated clients require repeatability.

Deterministic scoring proof means:

Every score is a ledger of deltas (“+0.12 because EPSS=…, +0.18 because reachable path exists, −0.07 because seccomp enforced…”).
The score can be replayed later and must match bit-for-bit given the same inputs (feeds, rules, policies, seed).

Unknowns are the “we don’t know yet” facts (missing VEX, ambiguous versions, unresolved indirect call edges). Instead of spamming, Stella ranks Unknowns by likely impact so DevOps sees the top 1–5 that actually matter.

Implementation plan for Stella Ops

Phase 0 — Lay the foundation (1 sprint)

Goal: make scans replayable and attach proofs to findings even before reachability is “perfect”.

0.1 Create a signed Scan Manifest (system-of-record in Postgres)

A manifest is a declarative capture of everything that affects results.

Store:

artifact digest(s)
tool versions (scanner workers + rule engine)
Concelier snapshot hash(es) used
Excititor snapshot hash(es) used
lattice/policy digest (executed in scanner.webservice)
deterministic flags + seed
config knobs (depth limits, indirect-call resolution mode, etc.)

Deliverables

scan_manifest table in Postgres
DSSE signature for the manifest
GET /scan/{id}/manifest endpoint

0.2 Proof Bundle format + storage

Store proof artifacts content-addressed (zip or directory) and reference them from findings.

Bundle contains

callgraph subset (or placeholder graph in v0)
score proof tree (CBOR/FlatBuffers)
references to evidence inputs (SBOM/VEX/feeds digests)

Deliverables

proof_bundle metadata table in Postgres (uri, root_hash, dsse_envelope)
filesystem/S3-compatible storage adapter
GET /scan/{id}/proofs/{findingId} endpoint

Phase 1 — Deterministic scoring + Unknowns (1–2 sprints)

Goal: every score becomes replayable; Unknowns become a controlled queue.

1.1 Score Proof Tree “ledger”

Implement a small internal library in .NET:

pure functions: inputs → score + proof nodes
nodes: Input, Transform, Delta, Score
deterministic ordering and hashing

Deliverables

stella score replay --scan <id> --seed <seed> CLI (or internal job)
POST /score/replay in scanner.webservice (recompute score without rescanning binaries)
score_proofs stored in the Proof Bundle

1.2 Unknowns registry + ranking (computed in scanner.webservice)

Unknown reasons (examples):

missing VEX for a CVE/component
version provenance uncertain
ambiguous indirect call edge for reachability
packed/stripped binary blocking symbolization

Ranking model (deterministic)

blast radius (dependents, privilege, net-facing)
evidence scarcity (how many critical facts missing)
exploit pressure (EPSS/KEV presence if available via Concelier snapshot)
containment signals (seccomp/RO-fs observed)

Deliverables

unknowns table + API GET /unknowns?sort=score
unknown proof tree (why it’s ranked #1)
UI: Unknowns collapsed by default; top reasons + “view proof”

1.3 Feed refresh re-scores without rescans

Respect your architecture rule:

Concelier/Excititor publish snapshots (preserve prune source)
scanner.webservice runs lattice + scoring

Flow

Scheduled detects a new Concelier/Excititor snapshot hash
Scheduled calls scanner.webservice /score/replay for impacted scans
Notify emits “score delta” + proof link

Deliverables

scheduled.webservice job: “rescore impacted scans”
notify.webservice message template: “what changed + proof root hash”

Phase 2 — Binary reachability engine v1 (2–3 sprints)

Goal: ship a reachability proof that is useful today, then iterate fidelity.

2.1 v1 scope (pragmatic)

Start with:

ELF (Linux containers) first
imports/exports + PLT/GOT edges
direct calls + conservative handling of indirect calls
entrypoints: main, exported functions, known framework entry hooks

What v1 outputs

“reachable / not proven reachable / unknown”
shortest path found (bounded depth)
proof subgraph: nodes + edges + address ranges + symbol names if present

Deliverables

scanner.worker.binary (or binary module inside scanner worker) produces:
- CFG/CG summary artifact
- per-finding path proof (if found)
TTFRP metric (time-to-first-reachable-path)

2.2 Proof format for reachability

For each finding:

callgraph.pb (or flatbuffers)
path_0.ir (text SSA/IL transcript OR “disasm trace” v1)
evidence.json (addresses, symbolization mode, loader metadata)

2.3 Ground-truth corpus + CI gates

Create a small repo of curated binaries with known reachable/unreachable sinks. Run nightly and per-PR.

Gates

precision/recall must not regress
deterministic replay must remain 100% on corpus
TTFRP tracked (trend, not hard fail initially)

Deliverables

scanner.webservice /bench/run
scheduler nightly bench
notify posts regressions in PR

Phase 3 — “Best in class” improvements (ongoing)

Better indirect call resolution (vtables, function pointers) with proof constraints
Stripped binary symbol recovery heuristics
Optional snippet/harness generator (start as IR transcript, evolve to runnable)
Multi-arch support (arm64) and PE/Mach-O if needed

Concrete service responsibilities (so your team doesn’t misplace logic)

Concelier (feeds)

ingest EPSS/KEV/advisories
produce versioned snapshots with hashes
does not run lattice/scoring

Excititor (VEX)

ingest vendor/openvex/csaf signals
normalize and emit snapshots with hashes
preserve prune source
does not run lattice/scoring

Scanner.webservice

orchestrates scanning workers
runs lattice merge + scoring
emits manifest + proofs
exposes replay endpoints

Scheduled.webservice

detects new snapshot hashes
triggers replay/rescore jobs (no rescans)
schedules benchmark runs

Notify.webservice

delivers deltas with proof links (root hash + manifest hash)

Minimum “definition of done” checklist

You’ll know this is implemented when:

Every scan stores a DSSE-signed Scan Manifest in Postgres.
Every finding has a proofRef (score proof always; reachability proof where available).
score replay reproduces the same total and per-rule deltas given the manifest.
Unknowns are visible, ranked, and proof-linked, and updates from feeds change ranks via replay (not rescans).
Corpus CI exists and prevents silent regressions.

If you want, I can translate this plan into:

Postgres schema migrations (tables + indexes),
REST endpoints for scanner.webservice, scheduled.webservice, notify.webservice,
and a sprint-ready TASKS.md split per module with acceptance tests. Below is an “implementation extension” with (1) concrete goals/DoD and (2) major code snippets your developers can lift directly into Stella Ops (.NET 10/C#).

I’m keeping the architecture rule intact: Concelier + Excititor only emit snapshots (preserve prune source); scanner.webservice runs lattice/scoring and emits proofs. System of record is Postgres, with Valkey optional/ephemeral.

0) Concrete goals and Definition of Done

Phase A — Deterministic scan + Proof infrastructure (must ship first)

Goal A1 — Scan Manifest exists and is DSSE-signed

Every scan produces a ScanManifest containing:
- artifact digest(s) (image digest, file digest)
- scanner versions
- concelierSnapshotHash, excititorSnapshotHash
- lattice/policy hash (executed in scanner.webservice)
- deterministic flags + seed
- config knobs (depth limits, indirect-call resolution mode, etc.)
Manifest stored in Postgres and in the Proof Bundle.
Manifest DSSE signature verified by stella proof verify.

Goal A2 — Proof Bundle exists for every scan

Proof bundle is content-addressed: rootHash + DSSE envelope stored.
Bundle contains at minimum:
- manifest.json (canonical)
- score_proof.cbor (or canonical JSON v1)
- evidence_refs.json (digests of inputs)

DoD

Same scan inputs + same seed produce identical manifest hash and identical proof root hash.

Phase B — Deterministic scoring ledger + replay

Goal B1 — Scoring is a pure function

Score = f(Manifest, Findings, FeedSnapshot, VEXSnapshot, RuntimeSignals?, Seed)
Every numeric change is recorded as a proof node (Delta) with evidence references.

Goal B2 — Replay

POST /score/replay recomputes scores from manifest + snapshot hashes without rescanning binaries.
Replay output (proof root hash + totals) is identical across runs.

DoD

Replay for a prior scan must reproduce bit-identical proof output (hash match).

Phase C — Unknowns registry + deterministic ranking

Goal C1 — Unknowns are first-class

Unknown item emitted when evidence is missing or ambiguous:
- missing VEX, ambiguous component version, unresolved indirect-call edge, packed binary, etc.
Unknowns ranked deterministically with a proof trail.

DoD

UI shows top-ranked Unknowns collapsed by default; every Unknown has “View proof”.

Phase D — Binary-only reachability v1 (useful quickly)

Goal D1 — Reachability classification

Each vulnerable sink gets: Reachable | NotProvenReachable | Unknown
When reachable, emit a shortest path proof (bounded BFS) from entrypoint.

Goal D2 — TTFRP metric

Emit TTFRP and store per scan.

DoD

Corpus benchmark job runs nightly and tracks precision/recall + TTFRP trends.

1) Core data models (Manifest, Proof Nodes, Unknowns)

1.1 ScanManifest (canonical JSON for hashing)

public sealed record ScanManifest(
    string ScanId,
    DateTimeOffset CreatedAtUtc,
    string ArtifactDigest,              // sha256:... or image digest
    string ArtifactPurl,                // optional
    string ScannerVersion,              // scanner.webservice version
    string WorkerVersion,               // scanner.worker.* version
    string ConcelierSnapshotHash,       // immutable feed snapshot digest
    string ExcititorSnapshotHash,       // immutable vex snapshot digest
    string LatticePolicyHash,           // policy bundle digest
    bool Deterministic,
    byte[] Seed,                        // 32 bytes
    IReadOnlyDictionary<string,string> Knobs // depth limits etc.
);

1.2 ProofNode (ledger entries)

public enum ProofNodeKind { Input, Transform, Delta, Score }

public sealed record ProofNode(
    string Id,
    ProofNodeKind Kind,
    string RuleId,
    string[] ParentIds,
    string[] EvidenceRefs,  // digests / refs inside bundle
    double Delta,           // 0 for non-Delta nodes
    double Total,           // running total at this node
    string Actor,           // module name
    DateTimeOffset TsUtc,
    byte[] Seed,
    string NodeHash         // sha256 over canonical node (excluding NodeHash)
);

1.3 UnknownItem

public sealed record UnknownItem(
    string Id,
    string ArtifactDigest,
    string ArtifactPurl,
    string[] Reasons,
    BlastRadius BlastRadius,
    double EvidenceScarcity,
    ExploitPressure ExploitPressure,
    ContainmentSignals Containment,
    double Score,           // 0..1
    string ProofRef         // path inside proof bundle
);

public sealed record BlastRadius(int Dependents, bool NetFacing, string Privilege); // "root"/"user"
public sealed record ExploitPressure(double? Epss, bool Kev);
public sealed record ContainmentSignals(string Seccomp, string Fs); // "enforced"/"none", "ro"/"rw"

2) Canonical JSON + hashing (determinism foundation)

2.1 Canonicalize JSON (sort object keys recursively)

using System.Security.Cryptography;
using System.Text;
using System.Text.Json;

public static class CanonJson
{
    public static byte[] Canonicalize<T>(T obj)
    {
        var json = JsonSerializer.SerializeToUtf8Bytes(obj, new JsonSerializerOptions
        {
            WriteIndented = false,
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase
        });

        using var doc = JsonDocument.Parse(json);
        using var ms = new MemoryStream();
        using var writer = new Utf8JsonWriter(ms, new JsonWriterOptions { Indented = false });

        WriteElementSorted(doc.RootElement, writer);
        writer.Flush();
        return ms.ToArray();
    }

    private static void WriteElementSorted(JsonElement el, Utf8JsonWriter w)
    {
        switch (el.ValueKind)
        {
            case JsonValueKind.Object:
                w.WriteStartObject();
                foreach (var prop in el.EnumerateObject().OrderBy(p => p.Name, StringComparer.Ordinal))
                {
                    w.WritePropertyName(prop.Name);
                    WriteElementSorted(prop.Value, w);
                }
                w.WriteEndObject();
                break;

            case JsonValueKind.Array:
                w.WriteStartArray();
                foreach (var item in el.EnumerateArray())
                    WriteElementSorted(item, w);
                w.WriteEndArray();
                break;

            default:
                el.WriteTo(w);
                break;
        }
    }

    public static string Sha256Hex(ReadOnlySpan<byte> bytes)
        => Convert.ToHexString(SHA256.HashData(bytes)).ToLowerInvariant();
}

3) DSSE envelope (sign manifests and proof roots)

3.1 DSSE types + signer abstraction

public sealed record DsseEnvelope(
    string PayloadType,
    string Payload, // base64
    DsseSignature[] Signatures
);

public sealed record DsseSignature(string KeyId, string Sig); // base64 sig

public interface IContentSigner
{
    string KeyId { get; }
    byte[] Sign(ReadOnlySpan<byte> message);
    bool Verify(ReadOnlySpan<byte> message, ReadOnlySpan<byte> signature);
}

3.2 DSSE build (DSSE preauth encoding)

using System.Text;

public static class Dsse
{
    // DSSE PAE:
    // PAE("DSSEv1", payloadType, payload)
    public static byte[] PAE(string payloadType, ReadOnlySpan<byte> payload)
    {
        static byte[] Len(byte[] b) => Encoding.UTF8.GetBytes(b.Length.ToString());

        var pt = Encoding.UTF8.GetBytes(payloadType);
        var dsse = Encoding.UTF8.GetBytes("DSSEv1");

        using var ms = new MemoryStream();
        void WritePart(byte[] part)
        {
            ms.Write(Len(part));
            ms.WriteByte((byte)' ');
            ms.Write(part);
            ms.WriteByte((byte)' ');
        }

        WritePart(dsse);
        WritePart(pt);
        ms.Write(Len(payload.ToArray()));
        ms.WriteByte((byte)' ');
        ms.Write(payload);
        return ms.ToArray();
    }

    public static DsseEnvelope SignJson<T>(string payloadType, T payloadObj, IContentSigner signer)
    {
        var payload = CanonJson.Canonicalize(payloadObj);
        var pae = PAE(payloadType, payload);
        var sig = signer.Sign(pae);

        return new DsseEnvelope(
            payloadType,
            Convert.ToBase64String(payload),
            new[] { new DsseSignature(signer.KeyId, Convert.ToBase64String(sig)) }
        );
    }
}

3.3 ECDSA P-256 signer (portable default)

using System.Security.Cryptography;

public sealed class EcdsaP256Signer : IContentSigner, IDisposable
{
    private readonly ECDsa _ecdsa;
    public string KeyId { get; }

    public EcdsaP256Signer(string keyId, ECDsa ecdsa)
    {
        KeyId = keyId;
        _ecdsa = ecdsa;
    }

    public byte[] Sign(ReadOnlySpan<byte> message)
        => _ecdsa.SignData(message.ToArray(), HashAlgorithmName.SHA256);

    public bool Verify(ReadOnlySpan<byte> message, ReadOnlySpan<byte> signature)
        => _ecdsa.VerifyData(message.ToArray(), signature.ToArray(), HashAlgorithmName.SHA256);

    public void Dispose() => _ecdsa.Dispose();
}

4) Proof ledger: append nodes, compute node hashes, compute root hash

4.1 Node hashing (exclude NodeHash itself)

public static class ProofHashing
{
    public static ProofNode WithHash(ProofNode n)
    {
        var canonical = CanonJson.Canonicalize(new
        {
            n.Id, n.Kind, n.RuleId, n.ParentIds, n.EvidenceRefs, n.Delta, n.Total,
            n.Actor, n.TsUtc, Seed = Convert.ToBase64String(n.Seed)
        });

        return n with { NodeHash = "sha256:" + CanonJson.Sha256Hex(canonical) };
    }

    public static string ComputeRootHash(IEnumerable<ProofNode> nodesInOrder)
    {
        // Deterministic: root hash over canonical JSON array of node hashes in order.
        var arr = nodesInOrder.Select(n => n.NodeHash).ToArray();
        var bytes = CanonJson.Canonicalize(arr);
        return "sha256:" + CanonJson.Sha256Hex(bytes);
    }
}

4.2 Minimal ledger (deterministic ordering enforced by append order)

public sealed class ProofLedger
{
    private readonly List<ProofNode> _nodes = new();
    public IReadOnlyList<ProofNode> Nodes => _nodes;

    public void Append(ProofNode node)
    {
        _nodes.Add(ProofHashing.WithHash(node));
    }

    public string RootHash() => ProofHashing.ComputeRootHash(_nodes);
}

5) Deterministic scoring function (with proof nodes)

5.1 Example scoring pipeline (CVSS + EPSS + reachability + containment)

public sealed record ScoreInputs(
    double CvssBase,              // 0..10
    double? Epss,                 // 0..1
    bool Kev,
    ReachabilityClass Reachability, // Reachable/NotProven/Unknown
    ContainmentSignals Containment
);

public enum ReachabilityClass { Reachable, NotProvenReachable, Unknown }

public static class RiskScoring
{
    public static (double Score01, ProofLedger Ledger) Score(
        ScoreInputs input,
        string scanId,
        byte[] seed,
        DateTimeOffset tsUtc)
    {
        var ledger = new ProofLedger();
        var total = 0.0;

        // Input node
        ledger.Append(new ProofNode(
            Id: $"in:{scanId}",
            Kind: ProofNodeKind.Input,
            RuleId: "inputs.v1",
            ParentIds: Array.Empty<string>(),
            EvidenceRefs: Array.Empty<string>(),
            Delta: 0,
            Total: total,
            Actor: "scanner.webservice",
            TsUtc: tsUtc,
            Seed: seed,
            NodeHash: ""
        ));

        // CVSS base mapping
        var cvss01 = Math.Clamp(input.CvssBase / 10.0, 0, 1);
        total += 0.55 * cvss01;
        ledger.Append(new ProofNode(
            Id: $"d:cvss:{scanId}",
            Kind: ProofNodeKind.Delta,
            RuleId: "score.cvss_base.weighted",
            ParentIds: new[] { $"in:{scanId}" },
            EvidenceRefs: new[] { $"cvss:{input.CvssBase:0.0}" },
            Delta: 0.55 * cvss01,
            Total: total,
            Actor: "scanner.webservice",
            TsUtc: tsUtc,
            Seed: seed,
            NodeHash: ""
        ));

        // EPSS (optional)
        if (input.Epss is { } epss)
        {
            total += 0.25 * Math.Clamp(epss, 0, 1);
            ledger.Append(new ProofNode(
                Id: $"d:epss:{scanId}",
                Kind: ProofNodeKind.Delta,
                RuleId: "score.epss.weighted",
                ParentIds: new[] { $"d:cvss:{scanId}" },
                EvidenceRefs: new[] { $"epss:{epss:0.0000}" },
                Delta: 0.25 * epss,
                Total: total,
                Actor: "scanner.webservice",
                TsUtc: tsUtc,
                Seed: seed,
                NodeHash: ""
            ));
        }

        // KEV boosts urgency
        if (input.Kev)
        {
            total += 0.15;
            ledger.Append(new ProofNode(
                Id: $"d:kev:{scanId}",
                Kind: ProofNodeKind.Delta,
                RuleId: "score.kev.bump",
                ParentIds: new[] { $"d:cvss:{scanId}" },
                EvidenceRefs: new[] { "kev:true" },
                Delta: 0.15,
                Total: total,
                Actor: "scanner.webservice",
                TsUtc: tsUtc,
                Seed: seed,
                NodeHash: ""
            ));
        }

        // Reachability
        var reachDelta = input.Reachability switch
        {
            ReachabilityClass.Reachable => 0.20,
            ReachabilityClass.NotProvenReachable => 0.00,
            ReachabilityClass.Unknown => 0.08, // unknown still adds risk, but less than proven reachable
            _ => 0.00
        };
        total += reachDelta;
        ledger.Append(new ProofNode(
            Id: $"d:reach:{scanId}",
            Kind: ProofNodeKind.Delta,
            RuleId: "score.reachability",
            ParentIds: new[] { $"d:cvss:{scanId}" },
            EvidenceRefs: new[] { $"reach:{input.Reachability}" },
            Delta: reachDelta,
            Total: total,
            Actor: "scanner.webservice",
            TsUtc: tsUtc,
            Seed: seed,
            NodeHash: ""
        ));

        // Containment deductions (examples)
        var containmentDelta = 0.0;
        if (string.Equals(input.Containment.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase))
            containmentDelta -= 0.05;
        if (string.Equals(input.Containment.Fs, "ro", StringComparison.OrdinalIgnoreCase))
            containmentDelta -= 0.03;

        total = Math.Clamp(total + containmentDelta, 0, 1);
        ledger.Append(new ProofNode(
            Id: $"d:contain:{scanId}",
            Kind: ProofNodeKind.Delta,
            RuleId: "score.containment",
            ParentIds: new[] { $"d:reach:{scanId}" },
            EvidenceRefs: new[] { $"seccomp:{input.Containment.Seccomp}", $"fs:{input.Containment.Fs}" },
            Delta: containmentDelta,
            Total: total,
            Actor: "scanner.webservice",
            TsUtc: tsUtc,
            Seed: seed,
            NodeHash: ""
        ));

        // Final score node
        ledger.Append(new ProofNode(
            Id: $"s:{scanId}",
            Kind: ProofNodeKind.Score,
            RuleId: "score.final",
            ParentIds: new[] { $"d:contain:{scanId}" },
            EvidenceRefs: new[] { "root" },
            Delta: 0,
            Total: total,
            Actor: "scanner.webservice",
            TsUtc: tsUtc,
            Seed: seed,
            NodeHash: ""
        ));

        return (total, ledger);
    }
}

6) Unknown ranking (deterministic) + proof

6.1 Ranking function

public static class UnknownRanker
{
    public static double Rank(BlastRadius b, double scarcity, ExploitPressure ep, ContainmentSignals c)
    {
        var dependents01 = Math.Clamp(b.Dependents / 50.0, 0, 1);
        var net = b.NetFacing ? 0.5 : 0.0;
        var priv = string.Equals(b.Privilege, "root", StringComparison.OrdinalIgnoreCase) ? 0.5 : 0.0;
        var blast = Math.Clamp((dependents01 + net + priv) / 2.0, 0, 1);

        var epss01 = ep.Epss is null ? 0.35 : Math.Clamp(ep.Epss.Value, 0, 1); // default mild pressure
        var kev = ep.Kev ? 0.30 : 0.0;
        var pressure = Math.Clamp(epss01 + kev, 0, 1);

        var containment = 0.0;
        if (string.Equals(c.Seccomp, "enforced", StringComparison.OrdinalIgnoreCase)) containment -= 0.10;
        if (string.Equals(c.Fs, "ro", StringComparison.OrdinalIgnoreCase)) containment -= 0.10;

        return Math.Clamp(0.60 * blast + 0.30 * scarcity + 0.30 * pressure + containment, 0, 1);
    }
}

6.2 Unknown proof node pattern

When you compute Unknown rank, emit a mini ledger identical to score proofs:

Input node: reasons + evidence scarcity facts
Delta nodes: blast/pressure/containment components
Score node: final unknown score Store it in proofs/unknowns/{unkId}/tree.json.

7) Proof Bundle writer (zip + root hash + DSSE)

using System.IO.Compression;

public sealed class ProofBundleWriter
{
    public static async Task<(string RootHash, string BundlePath)> WriteAsync(
        string baseDir,
        ScanManifest manifest,
        ProofLedger scoreLedger,
        DsseEnvelope manifestDsse,
        IContentSigner signer,
        CancellationToken ct)
    {
        Directory.CreateDirectory(baseDir);

        var manifestBytes = CanonJson.Canonicalize(manifest);
        var ledgerBytes = CanonJson.Canonicalize(scoreLedger.Nodes); // v1 JSON; swap to CBOR later

        // Root hash covers canonical content (manifest + ledger)
        var rootMaterial = CanonJson.Canonicalize(new
        {
            manifest = "sha256:" + CanonJson.Sha256Hex(manifestBytes),
            scoreProof = "sha256:" + CanonJson.Sha256Hex(ledgerBytes),
            scoreRoot = scoreLedger.RootHash()
        });

        var rootHash = "sha256:" + CanonJson.Sha256Hex(rootMaterial);

        // DSSE sign the root descriptor
        var rootDsse = Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new
        {
            rootHash,
            scoreRoot = scoreLedger.RootHash()
        }, signer);

        var bundleName = $"{manifest.ScanId}_{rootHash.Replace("sha256:", "")}.zip";
        var bundlePath = Path.Combine(baseDir, bundleName);

        await using var fs = File.Create(bundlePath);
        using var zip = new ZipArchive(fs, ZipArchiveMode.Create, leaveOpen: false);

        void Add(string name, byte[] content)
        {
            var e = zip.CreateEntry(name, CompressionLevel.Optimal);
            using var s = e.Open();
            s.Write(content, 0, content.Length);
        }

        Add("manifest.json", manifestBytes);
        Add("manifest.dsse.json", CanonJson.Canonicalize(manifestDsse));
        Add("score_proof.json", ledgerBytes);
        Add("proof_root.dsse.json", CanonJson.Canonicalize(rootDsse));
        Add("meta.json", CanonJson.Canonicalize(new { rootHash, createdAtUtc = DateTimeOffset.UtcNow }));

        return (rootHash, bundlePath);
    }
}

8) Postgres schema (authoritative) and EF Core skeleton

8.1 Tables (SQL snippet)

create table scan_manifest (
  scan_id text primary key,
  created_at_utc timestamptz not null,
  artifact_digest text not null,
  concelier_snapshot_hash text not null,
  excititor_snapshot_hash text not null,
  lattice_policy_hash text not null,
  deterministic boolean not null,
  seed bytea not null,
  manifest_json jsonb not null,
  manifest_dsse_json jsonb not null,
  manifest_hash text not null
);

create table proof_bundle (
  scan_id text not null references scan_manifest(scan_id),
  root_hash text not null,
  bundle_uri text not null,
  proof_root_dsse_json jsonb not null,
  created_at_utc timestamptz not null,
  primary key (scan_id, root_hash)
);

create index ix_scan_manifest_artifact on scan_manifest(artifact_digest);
create index ix_scan_manifest_snapshots on scan_manifest(concelier_snapshot_hash, excititor_snapshot_hash);

8.2 EF Core entities (minimal)

public sealed class ScannerDbContext : DbContext
{
    public DbSet<ScanManifestRow> ScanManifests => Set<ScanManifestRow>();
    public DbSet<ProofBundleRow> ProofBundles => Set<ProofBundleRow>();

    public ScannerDbContext(DbContextOptions<ScannerDbContext> options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder b)
    {
        b.Entity<ScanManifestRow>().HasKey(x => x.ScanId);
        b.Entity<ProofBundleRow>().HasKey(x => new { x.ScanId, x.RootHash });
        b.Entity<ScanManifestRow>().HasIndex(x => x.ArtifactDigest);
    }
}

public sealed class ScanManifestRow
{
    public string ScanId { get; set; } = default!;
    public DateTimeOffset CreatedAtUtc { get; set; }
    public string ArtifactDigest { get; set; } = default!;
    public string ConcelierSnapshotHash { get; set; } = default!;
    public string ExcititorSnapshotHash { get; set; } = default!;
    public string LatticePolicyHash { get; set; } = default!;
    public bool Deterministic { get; set; }
    public byte[] Seed { get; set; } = default!;
    public string ManifestHash { get; set; } = default!;
    public string ManifestJson { get; set; } = default!;       // store canonical JSON string
    public string ManifestDsseJson { get; set; } = default!;
}

public sealed class ProofBundleRow
{
    public string ScanId { get; set; } = default!;
    public string RootHash { get; set; } = default!;
    public string BundleUri { get; set; } = default!;
    public DateTimeOffset CreatedAtUtc { get; set; }
    public string ProofRootDsseJson { get; set; } = default!;
}

9) `scanner.webservice` endpoints (minimal APIs)

using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.EntityFrameworkCore;

var app = WebApplication.CreateBuilder(args)
    .AddServices()
    .Build();

app.MapPost("/scan", async (ScanRequest req, ScannerDbContext db, CancellationToken ct) =>
{
    var scanId = Guid.NewGuid().ToString("n");
    var seed = req.Seed ?? RandomNumberGenerator.GetBytes(32);
    var created = DateTimeOffset.UtcNow;

    // Snapshot hashes come from your snapshot selector (by policy/environment)
    var concelierHash = req.ConcelierSnapshotHash;
    var excititorHash = req.ExcititorSnapshotHash;

    var manifest = new ScanManifest(
        ScanId: scanId,
        CreatedAtUtc: created,
        ArtifactDigest: req.ArtifactDigest,
        ArtifactPurl: req.ArtifactPurl ?? "",
        ScannerVersion: req.ScannerVersion,
        WorkerVersion: req.WorkerVersion,
        ConcelierSnapshotHash: concelierHash,
        ExcititorSnapshotHash: excititorHash,
        LatticePolicyHash: req.LatticePolicyHash,
        Deterministic: req.Deterministic,
        Seed: seed,
        Knobs: req.Knobs ?? new Dictionary<string,string>()
    );

    var manifestHash = "sha256:" + CanonJson.Sha256Hex(CanonJson.Canonicalize(manifest));

    // Sign DSSE
    using var signer = YourSignerFactory.Create(); // ECDSA or other profile
    var dsse = Dsse.SignJson("application/vnd.stellaops.scan-manifest.v1+json", manifest, signer);

    db.ScanManifests.Add(new ScanManifestRow
    {
        ScanId = scanId,
        CreatedAtUtc = created,
        ArtifactDigest = req.ArtifactDigest,
        ConcelierSnapshotHash = concelierHash,
        ExcititorSnapshotHash = excititorHash,
        LatticePolicyHash = req.LatticePolicyHash,
        Deterministic = req.Deterministic,
        Seed = seed,
        ManifestHash = manifestHash,
        ManifestJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(manifest)),
        ManifestDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(dsse))
    });

    await db.SaveChangesAsync(ct);

    return Results.Ok(new { scanId, manifestHash });
});

app.MapGet("/scan/{scanId}/manifest", async (string scanId, ScannerDbContext db, CancellationToken ct) =>
{
    var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct);
    return Results.Text(row.ManifestJson, "application/json");
});

app.MapPost("/scan/{scanId}/score/replay", async (string scanId, ScannerDbContext db, CancellationToken ct) =>
{
    var row = await db.ScanManifests.AsNoTracking().SingleAsync(x => x.ScanId == scanId, ct);
    var manifest = JsonSerializer.Deserialize<ScanManifest>(row.ManifestJson)!;

    // Load findings + snapshots by hash (your repositories)
    var inputs = new ScoreInputs(
        CvssBase: 9.1,
        Epss: 0.62,
        Kev: false,
        Reachability: ReachabilityClass.Unknown,
        Containment: new ContainmentSignals("enforced", "ro")
    );

    var (score, ledger) = RiskScoring.Score(inputs, scanId, manifest.Seed, DateTimeOffset.UtcNow);

    using var signer = YourSignerFactory.Create();
    var (rootHash, bundlePath) = await ProofBundleWriter.WriteAsync(
        baseDir: "/var/lib/stellaops/proofs",
        manifest: manifest,
        scoreLedger: ledger,
        manifestDsse: JsonSerializer.Deserialize<DsseEnvelope>(row.ManifestDsseJson)!,
        signer: signer,
        ct: ct);

    db.ProofBundles.Add(new ProofBundleRow
    {
        ScanId = scanId,
        RootHash = rootHash,
        BundleUri = bundlePath,
        CreatedAtUtc = DateTimeOffset.UtcNow,
        ProofRootDsseJson = Encoding.UTF8.GetString(CanonJson.Canonicalize(
            Dsse.SignJson("application/vnd.stellaops.proof-root.v1+json", new { rootHash }, signer)))
    });

    await db.SaveChangesAsync(ct);

    return Results.Ok(new { score, rootHash, bundleUri = bundlePath });
});

app.Run();

public sealed record ScanRequest(
    string ArtifactDigest,
    string? ArtifactPurl,
    string ScannerVersion,
    string WorkerVersion,
    string ConcelierSnapshotHash,
    string ExcititorSnapshotHash,
    string LatticePolicyHash,
    bool Deterministic,
    byte[]? Seed,
    Dictionary<string,string>? Knobs
);

10) Binary reachability v1: major skeleton (bounded BFS over a naive callgraph)

This is intentionally “v1”: direct calls + imports + conservative unknowns. It still delivers value fast.

public sealed record FuncNode(ulong Address, string Name);
public sealed record CallEdge(ulong From, ulong To, string Kind); // "direct"/"import"/"indirect"

public sealed class CallGraph
{
    public Dictionary<ulong, FuncNode> Nodes { get; } = new();
    public List<CallEdge> Edges { get; } = new();

    public IEnumerable<ulong> Neighbors(ulong from)
        => Edges.Where(e => e.From == from).Select(e => e.To);
}

public static class Reachability
{
    public static (ReachabilityClass Class, ulong[]? Path) FindPath(
        CallGraph cg,
        IEnumerable<ulong> entrypoints,
        ulong sink,
        int maxDepth)
    {
        var visited = new HashSet<ulong>();
        var parent = new Dictionary<ulong, ulong>();

        var q = new Queue<(ulong node, int depth)>();
        foreach (var ep in entrypoints)
        {
            q.Enqueue((ep, 0));
            visited.Add(ep);
        }

        while (q.Count > 0)
        {
            var (cur, depth) = q.Dequeue();
            if (cur == sink)
                return (ReachabilityClass.Reachable, Reconstruct(parent, cur));

            if (depth >= maxDepth) continue;

            foreach (var nxt in cg.Neighbors(cur))
            {
                if (visited.Add(nxt))
                {
                    parent[nxt] = cur;
                    q.Enqueue((nxt, depth + 1));
                }
            }
        }

        return (ReachabilityClass.NotProvenReachable, null);
    }

    private static ulong[] Reconstruct(Dictionary<ulong, ulong> parent, ulong end)
    {
        var path = new List<ulong> { end };
        while (parent.TryGetValue(end, out var p))
        {
            path.Add(p);
            end = p;
        }
        path.Reverse();
        return path.ToArray();
    }
}

Proof emission for reachability

Store:
- callgraph.json (nodes + edges subset relevant to this sink)
- path_0.json (address chain + symbol names)
Create a scoring delta node referencing reach:path_0.json when reachable.

11) Determinism test (xUnit “hash must match”)

public class DeterminismTests
{
    [Fact]
    public void Score_Replay_IsBitIdentical()
    {
        var seed = Enumerable.Repeat((byte)7, 32).ToArray();
        var inputs = new ScoreInputs(9.0, 0.50, false, ReachabilityClass.Unknown, new("enforced","ro"));

        var (s1, l1) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z"));
        var (s2, l2) = RiskScoring.Score(inputs, "scanA", seed, DateTimeOffset.Parse("2025-01-01T00:00:00Z"));

        Assert.Equal(s1, s2, 10);
        Assert.Equal(l1.RootHash(), l2.RootHash());
        Assert.True(l1.Nodes.Zip(l2.Nodes).All(z => z.First.NodeHash == z.Second.NodeHash));
    }
}

12) What developers should implement next (priority order)

Canonical JSON + hashing (Phase A prerequisite)
Manifest + DSSE signing + Postgres persistence
Proof ledger + root hash + Proof Bundle writer
Replay endpoint (/score/replay) and scheduler hook to rescore on new snapshot hashes
Unknown registry + deterministic ranking + proof
Reachability v1 (callgraph + bounded BFS + proof emission)
Corpus bench and CI regression gates

If you want, I can convert this into repo-ready TASKS.md blocks per module (scanner.webservice, scheduled.webservice, notify.webservice) with acceptance tests and a minimal migration set aligned to your existing naming conventions.

46 KiB Raw Blame History Unescape Escape

Binary‑only analysis & call‑graph fidelity

Deterministic score proofs & Unknowns ranking

Minimal schemas (drop‑in to Stella Ops)

Triggering & pipelines (existing services)

Developer guidelines (do this first)

Tiny .NET 10 sketch (partial, compile‑ready)

What you get if you ship this

1) Binary-only reachability: purpose

2) Deterministic score proofs + ranked Unknowns: purpose

Implementation plan for Stella Ops

Phase 0 — Lay the foundation (1 sprint)

0.1 Create a signed Scan Manifest (system-of-record in Postgres)

0.2 Proof Bundle format + storage

Phase 1 — Deterministic scoring + Unknowns (1–2 sprints)

1.1 Score Proof Tree “ledger”

1.2 Unknowns registry + ranking (computed in scanner.webservice)

1.3 Feed refresh re-scores without rescans

Phase 2 — Binary reachability engine v1 (2–3 sprints)

2.1 v1 scope (pragmatic)

2.2 Proof format for reachability

2.3 Ground-truth corpus + CI gates

Phase 3 — “Best in class” improvements (ongoing)

Concrete service responsibilities (so your team doesn’t misplace logic)

Concelier (feeds)

Excititor (VEX)

Scanner.webservice

Scheduled.webservice

Notify.webservice

Minimum “definition of done” checklist

0) Concrete goals and Definition of Done

Phase A — Deterministic scan + Proof infrastructure (must ship first)

Phase B — Deterministic scoring ledger + replay

Phase C — Unknowns registry + deterministic ranking

Phase D — Binary-only reachability v1 (useful quickly)

1) Core data models (Manifest, Proof Nodes, Unknowns)

1.1 ScanManifest (canonical JSON for hashing)

1.2 ProofNode (ledger entries)

1.3 UnknownItem

2) Canonical JSON + hashing (determinism foundation)

2.1 Canonicalize JSON (sort object keys recursively)

3) DSSE envelope (sign manifests and proof roots)

3.1 DSSE types + signer abstraction

3.2 DSSE build (DSSE preauth encoding)

3.3 ECDSA P-256 signer (portable default)

4) Proof ledger: append nodes, compute node hashes, compute root hash

4.1 Node hashing (exclude NodeHash itself)

4.2 Minimal ledger (deterministic ordering enforced by append order)

5) Deterministic scoring function (with proof nodes)

5.1 Example scoring pipeline (CVSS + EPSS + reachability + containment)

6) Unknown ranking (deterministic) + proof

6.1 Ranking function

6.2 Unknown proof node pattern

7) Proof Bundle writer (zip + root hash + DSSE)

8) Postgres schema (authoritative) and EF Core skeleton

8.1 Tables (SQL snippet)

8.2 EF Core entities (minimal)

9) scanner.webservice endpoints (minimal APIs)

10) Binary reachability v1: major skeleton (bounded BFS over a naive callgraph)

11) Determinism test (xUnit “hash must match”)

12) What developers should implement next (priority order)

46 KiB

Raw Blame History

Minimal schemas (drop‑in to Stella Ops)

9) `scanner.webservice` endpoints (minimal APIs)