Files
git.stella-ops.org/docs/product-advisories/02-Dec-2025 - Benchmarking a Testable Security Moat.md
2025-12-09 20:23:50 +02:00

13 KiB
Raw Blame History

Heres a crisp, plugin set of reproducible benchmarks you can bake into StellaOps so buyers, auditors, and your own team can see measurable wins—without handwavy heuristics.

Benchmarks StellaOps should standardize

1) TimetoEvidence (TTE) How fast StellaOps turns a “suspicion” into a signed, auditorusable proof (e.g., VEX+attestations).

  • Definition: TTE = t(proof_ready) t(artifact_ingested)

  • Scope: scanning, reachability, policy evaluation, proof generation, notarization, and publication to your proof ledger.

  • Targets:

    • P50 < 2m for typical container images (≤ 500 MB, known ecosystems).
    • P95 < 5m including coldstart/offlinebundle mode.
  • Report: Median/P95 by artifact size bucket; break down stages (fetch → analyze → reachability → VEX → sign → publish).

  • Auditable logs: DSSE/DSD signatures, policy hash, feed set IDs, scanner build hash.

2) FalseNegative Drift Rate (FNDrift) Catches when a previously “clean” artifact later becomes “affected” because the world changed (new CVE, rule, or feed).

  • Definition (rolling window 30d): FNDrift = (# artifacts reclassified from {unaffected/unknown} → affected) / (total artifacts reevaluated)
  • Stratify by cause: feed delta, rule delta, lattice/policy delta, reachability delta.
  • Goal: keep feedcaused FNDrift low by faster deltas (good) while keeping enginecaused FNDrift near zero (stability).
  • Guardrails: require explanations on reclassification: include diff of feeds, rule versions, and lattice policy commit.
  • Badge: “No enginecaused FN drift in 90d” (hashlinked evidence bundle).

3) Deterministic Rescan Reproducibility (HashStable Proofs) Same inputs → same outputs, byteforbyte, including proofs. Crucial for audits and regulated buys.

  • Definition: Given a scan manifest (artifact digest, feed snapshots, engine build hash, lattice/policy hash), rescan must produce identical: findings set, VEX decisions, proofs, and toplevel bundle hash.

  • Metric: Repro rate = identical_outputs / total_replays (target 100%).

  • Proof object:

    {
      artifact_digest,
      scan_manifest_hash,
      feeds_merkle_root,
      engine_build_hash,
      policy_lattice_hash,
      findings_sha256,
      vex_bundle_sha256,
      proof_bundle_sha256
    }
    
  • CI check: nightly replay of a fixed corpus; fail pipeline on any nondeterminism (with diff).

Minimal implementation plan (developerready)

  • Canonical Scan Manifest (CSM): immutable JSON (canonicalized), covering: artifact digests; feed URIs + content hashes; engine build + ruleset hashes; lattice/policy hash; config flags; environment fingerprint (CPU features, locale). Store CSM + DSSE envelope.
  • Stage timers: emit monotonic timestamps for each stage; roll up to TTE. Persist perartifact in Postgres (timeseries table by artifact_digest).
  • Delta reeval daemon: on any feed/rule/policy change, rescore the corpus referenced by that feed snapshot; log reclassifications with cause; compute FNDrift daily.
  • Replay harness: given a CSM, rerun pipeline in sealed mode (no network, feeds from snapshot); recompute bundle hashes; assert equality.
  • Proof bundle: tar/zip with canonical ordering; include SBOM slice, reachability graph, VEX, signatures, and an index.json (canonical). The bundles SHA256 is your public “proof hash.”

What to put on dashboards & in SLAs

  • TTE panel: P50/P95 by image size; stacked bars by stage; alerts when P95 breaches SLO.
  • FNDrift panel: overall and by cause; red flag if enginecaused drift > 0.1% in 30d.
  • Repro panel: last 24h/7d replay pass rate (goal 100%); list any nondeterministic modules.

Why this wins sales & audits

  • Auditors: can pick any proof hash → replay from CSM → get the exact same signed outcome.
  • Buyers: TTE proves speed; FNDrift proves stability and feed hygiene; Repro proves youre not heuristicwobbly.
  • Competitors: many cant show deterministic replay or attribute drift causes—your “hashstable proofs” make that gap obvious.

If you want, I can generate the exact PostgreSQL schema, .NET 10 structs, and a nightly replay GitLab job that enforces these three metrics outofthebox. Below is the complete, implementation-ready package you asked for: PostgreSQL schema, .NET 10 types, and a CI replay job for the three Stella Ops benchmarks: Time-to-Evidence (TTE), False-Negative Drift (FN-Drift), and Deterministic Replayability.

This is written so your mid-level developers can drop it directly into Stella Ops without re-architecting anything.


1. PostgreSQL Schema (Canonical, Deterministic, Normalized)

1.1 Table: scan_manifest

Immutable record describing exactly what was used for a scan.

CREATE TABLE scan_manifest (
    manifest_id UUID PRIMARY KEY,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    artifact_digest TEXT NOT NULL,
    feeds_merkle_root TEXT NOT NULL,
    engine_build_hash TEXT NOT NULL,
    policy_lattice_hash TEXT NOT NULL,

    ruleset_hash TEXT NOT NULL,
    config_flags JSONB NOT NULL,

    environment_fingerprint JSONB NOT NULL,

    raw_manifest JSONB NOT NULL,
    raw_manifest_sha256 TEXT NOT NULL
);

Notes:

  • raw_manifest is the canonical JSON used for deterministic replay.
  • raw_manifest_sha256 is the canonicalized-JSON hash, not a hash of the unformatted body.

1.2 Table: scan_execution

One execution corresponds to one run of the scanner with one manifest.

CREATE TABLE scan_execution (
    execution_id UUID PRIMARY KEY,
    manifest_id UUID NOT NULL REFERENCES scan_manifest(manifest_id) ON DELETE CASCADE,

    started_at TIMESTAMPTZ NOT NULL,
    finished_at TIMESTAMPTZ NOT NULL,

    t_ingest_ms INT NOT NULL,
    t_analyze_ms INT NOT NULL,
    t_reachability_ms INT NOT NULL,
    t_vex_ms INT NOT NULL,
    t_sign_ms INT NOT NULL,
    t_publish_ms INT NOT NULL,

    proof_bundle_sha256 TEXT NOT NULL,
    findings_sha256 TEXT NOT NULL,
    vex_bundle_sha256 TEXT NOT NULL,

    replay_mode BOOLEAN NOT NULL DEFAULT FALSE
);

Derived view for Time-to-Evidence:

CREATE VIEW scan_tte AS
SELECT
    execution_id,
    manifest_id,
    (finished_at - started_at) AS tte_interval
FROM scan_execution;

1.3 Table: classification_history

Used for FN-Drift tracking.

CREATE TABLE classification_history (
    id BIGSERIAL PRIMARY KEY,
    artifact_digest TEXT NOT NULL,
    manifest_id UUID NOT NULL REFERENCES scan_manifest(manifest_id) ON DELETE CASCADE,
    execution_id UUID NOT NULL REFERENCES scan_execution(execution_id) ON DELETE CASCADE,

    previous_status TEXT NOT NULL, -- unaffected | unknown | affected
    new_status TEXT NOT NULL,
    cause TEXT NOT NULL,           -- engine_delta | feed_delta | ruleset_delta | policy_delta

    changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Materialized view for drift statistics:

CREATE MATERIALIZED VIEW fn_drift_stats AS
SELECT
    date_trunc('day', changed_at) AS day_bucket,
    COUNT(*) FILTER (WHERE new_status = 'affected') AS affected_count,
    COUNT(*) AS total_reclassified,
    ROUND(
        (COUNT(*) FILTER (WHERE new_status = 'affected')::numeric /
         NULLIF(COUNT(*), 0)) * 100, 4
    ) AS drift_percent
FROM classification_history
GROUP BY 1;

2. .NET 10 / C# Types (Deterministic, Hash-Stable)

The following structs map 1:1 to the DB entities and enforce canonicalization rules.

2.1 CSM Structure

public sealed record CanonicalScanManifest
{
    public required string ArtifactDigest { get; init; }
    public required string FeedsMerkleRoot { get; init; }
    public required string EngineBuildHash { get; init; }
    public required string PolicyLatticeHash { get; init; }
    public required string RulesetHash { get; init; }

    public required IReadOnlyDictionary<string, string> ConfigFlags { get; init; }
    public required EnvironmentFingerprint Environment { get; init; }
}

public sealed record EnvironmentFingerprint
{
    public required string CpuModel { get; init; }
    public required string RuntimeVersion { get; init; }
    public required string Os { get; init; }
    public required IReadOnlyDictionary<string, string> Extra { get; init; }
}

Deterministic canonical-JSON serializer

Your developers must generate a stable JSON:

internal static class CanonicalJson
{
    private static readonly JsonSerializerOptions Options = new()
    {
        WriteIndented = false,
        PropertyNamingPolicy = JsonNamingPolicy.CamelCase
    };

    public static string Serialize(object obj)
    {
        using var stream = new MemoryStream();
        using (var writer = new Utf8JsonWriter(stream, new JsonWriterOptions
            {
                Indented = false,
                SkipValidation = false
            }))
        {
            JsonSerializer.Serialize(writer, obj, obj.GetType(), Options);
        }

        var bytes = stream.ToArray();
        // Sort object keys alphabetically and array items in stable order.
        // This step is mandatory to guarantee canonical form:
        var canonical = JsonCanonicalizer.Canonicalize(bytes);

        return canonical;
    }
}

JsonCanonicalizer is your deterministic canonicalization engine (already referenced in other Stella Ops modules).


2.2 Execution record

public sealed record ScanExecutionMetrics
{
    public required int IngestMs { get; init; }
    public required int AnalyzeMs { get; init; }
    public required int ReachabilityMs { get; init; }
    public required int VexMs { get; init; }
    public required int SignMs { get; init; }
    public required int PublishMs { get; init; }
}

2.3 Replay harness entrypoint

public static class ReplayRunner
{
    public static ReplayResult Replay(Guid manifestId, IScannerEngine engine)
    {
        var manifest = ManifestRepository.Load(manifestId);
        var canonical = CanonicalJson.Serialize(manifest.RawObject);
        var canonicalHash = Sha256(canonical);

        if (canonicalHash != manifest.RawManifestSHA256)
            throw new InvalidOperationException("Manifest integrity violation.");

        using var feeds = FeedSnapshotResolver.Open(manifest.FeedsMerkleRoot);

        var exec = engine.Scan(new ScanRequest
        {
            ArtifactDigest = manifest.ArtifactDigest,
            Feeds = feeds,
            LatticeHash = manifest.PolicyLatticeHash,
            EngineBuildHash = manifest.EngineBuildHash,
            CanonicalManifest = canonical
        });

        return new ReplayResult(
            exec.FindingsHash == manifest.FindingsSHA256,
            exec.VexBundleHash == manifest.VexBundleSHA256,
            exec.ProofBundleHash == manifest.ProofBundleSHA256,
            exec
        );
    }
}

Replay must run with:

  • no network
  • feeds resolved strictly from snapshots
  • deterministic clock (monotonic timers only)

3. GitLab CI Job for Nightly Deterministic Replay

replay-test:
  stage: test
  image: mcr.microsoft.com/dotnet/sdk:10.0
  script:
    - echo "Starting nightly deterministic replay"

    # 1. Export 200 random manifests from Postgres
    - >
      psql "$PG_CONN" -Atc "
      SELECT manifest_id
      FROM scan_manifest
      ORDER BY random()
      LIMIT 200
      " > manifests.txt

    # 2. Replay each manifest
    - >
      while read mid; do
        echo "Replaying $mid"
        dotnet run --project src/StellaOps.Scanner.Replay \
          --manifest $mid || exit 1
      done < manifests.txt

    # 3. Aggregate results
    - >
      if grep -R "NON-DETERMINISTIC" replay-logs; then
        echo "Replay failures detected"
        exit 1
      else
        echo "All replays deterministic"
      fi
  artifacts:
    paths:
      - replay-logs/
    expire_in: 7 days
  only:
    - schedules

Replay job failure criteria:

  • Any mismatch in findings/VEX/proof bundle hash
  • Any non-canonical input or manifest discrepancy
  • Any accidental feed/network access

4. Developer Rules (Should be added to docs/stellaops-developer-rules.md)

  1. A scan is not valid unless the Canonical Scan Manifest (CSM) hash is stored.
  2. Every stage must emit monotonic timestamps for TTE. Do not mix monotonic and wall clock.
  3. Classification changes must always include a cause: no silent reclassification.
  4. Replay mode must never reach network, dynamic rules, cloud feeds, or external clocks.
  5. Proof bundles must be TAR with deterministic ordering: alphabetical filenames, fixed uid/gid=0, fixed mtime=0.

5. Ready for integration

If you want, I can produce:

  • the full EF Core 9 mapping classes
  • a migration file consistent with your existing Stella Ops module naming
  • the Angular UI block that displays TTE, FN-Drift, and Replay statistics
  • a deterministic TAR writer (C#) for proof bundles

Tell me which part you want next.