13 KiB
Here’s a crisp, plug‑in set of reproducible benchmarks you can bake into Stella Ops so buyers, auditors, and your own team can see measurable wins—without hand‑wavy heuristics.
Benchmarks Stella Ops should standardize
1) Time‑to‑Evidence (TTE) How fast Stella Ops turns a “suspicion” into a signed, auditor‑usable proof (e.g., VEX+attestations).
-
Definition:
TTE = t(proof_ready) – t(artifact_ingested) -
Scope: scanning, reachability, policy evaluation, proof generation, notarization, and publication to your proof ledger.
-
Targets:
- P50 < 2m for typical container images (≤ 500 MB, known ecosystems).
- P95 < 5m including cold‑start/offline‑bundle mode.
-
Report: Median/P95 by artifact size bucket; break down stages (fetch → analyze → reachability → VEX → sign → publish).
-
Auditable logs: DSSE/DSD signatures, policy hash, feed set IDs, scanner build hash.
2) False‑Negative Drift Rate (FN‑Drift) Catches when a previously “clean” artifact later becomes “affected” because the world changed (new CVE, rule, or feed).
- Definition (rolling window 30d):
FN‑Drift = (# artifacts re‑classified from {unaffected/unknown} → affected) / (total artifacts re‑evaluated) - Stratify by cause: feed delta, rule delta, lattice/policy delta, reachability delta.
- Goal: keep feed‑caused FN‑Drift low by faster deltas (good) while keeping engine‑caused FN‑Drift near zero (stability).
- Guardrails: require explanations on re‑classification: include diff of feeds, rule versions, and lattice policy commit.
- Badge: “No engine‑caused FN drift in 90d” (hash‑linked evidence bundle).
3) Deterministic Re‑scan Reproducibility (Hash‑Stable Proofs) Same inputs → same outputs, byte‑for‑byte, including proofs. Crucial for audits and regulated buys.
-
Definition: Given a scan manifest (artifact digest, feed snapshots, engine build hash, lattice/policy hash), re‑scan must produce identical: findings set, VEX decisions, proofs, and top‑level bundle hash.
-
Metric:
Repro rate = identical_outputs / total_replays(target 100%). -
Proof object:
{ artifact_digest, scan_manifest_hash, feeds_merkle_root, engine_build_hash, policy_lattice_hash, findings_sha256, vex_bundle_sha256, proof_bundle_sha256 } -
CI check: nightly replay of a fixed corpus; fail pipeline on any non‑determinism (with diff).
Minimal implementation plan (developer‑ready)
- Canonical Scan Manifest (CSM): immutable JSON (canonicalized), covering: artifact digests; feed URIs + content hashes; engine build + ruleset hashes; lattice/policy hash; config flags; environment fingerprint (CPU features, locale). Store CSM + DSSE envelope.
- Stage timers: emit monotonic timestamps for each stage; roll up to TTE. Persist per‑artifact in Postgres (time‑series table by artifact_digest).
- Delta re‑eval daemon: on any feed/rule/policy change, re‑score the corpus referenced by that feed snapshot; log re‑classifications with cause; compute FN‑Drift daily.
- Replay harness: given a CSM, re‑run pipeline in sealed mode (no network, feeds from snapshot); recompute bundle hashes; assert equality.
- Proof bundle: tar/zip with canonical ordering; include SBOM slice, reachability graph, VEX, signatures, and an index.json (canonical). The bundle’s SHA256 is your public “proof hash.”
What to put on dashboards & in SLAs
- TTE panel: P50/P95 by image size; stacked bars by stage; alerts when P95 breaches SLO.
- FN‑Drift panel: overall and by cause; red flag if engine‑caused drift > 0.1% in 30d.
- Repro panel: last 24h/7d replay pass rate (goal 100%); list any non‑deterministic modules.
Why this wins sales & audits
- Auditors: can pick any proof hash → replay from CSM → get the exact same signed outcome.
- Buyers: TTE proves speed; FN‑Drift proves stability and feed hygiene; Repro proves you’re not heuristic‑wobbly.
- Competitors: many can’t show deterministic replay or attribute drift causes—your “hash‑stable proofs” make that gap obvious.
If you want, I can generate the exact PostgreSQL schema, .NET 10 structs, and a nightly replay GitLab job that enforces these three metrics out‑of‑the‑box. Below is the complete, implementation-ready package you asked for: PostgreSQL schema, .NET 10 types, and a CI replay job for the three Stella Ops benchmarks: Time-to-Evidence (TTE), False-Negative Drift (FN-Drift), and Deterministic Replayability.
This is written so your mid-level developers can drop it directly into Stella Ops without re-architecting anything.
1. PostgreSQL Schema (Canonical, Deterministic, Normalized)
1.1 Table: scan_manifest
Immutable record describing exactly what was used for a scan.
CREATE TABLE scan_manifest (
manifest_id UUID PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
artifact_digest TEXT NOT NULL,
feeds_merkle_root TEXT NOT NULL,
engine_build_hash TEXT NOT NULL,
policy_lattice_hash TEXT NOT NULL,
ruleset_hash TEXT NOT NULL,
config_flags JSONB NOT NULL,
environment_fingerprint JSONB NOT NULL,
raw_manifest JSONB NOT NULL,
raw_manifest_sha256 TEXT NOT NULL
);
Notes:
raw_manifestis the canonical JSON used for deterministic replay.raw_manifest_sha256is the canonicalized-JSON hash, not a hash of the unformatted body.
1.2 Table: scan_execution
One execution corresponds to one run of the scanner with one manifest.
CREATE TABLE scan_execution (
execution_id UUID PRIMARY KEY,
manifest_id UUID NOT NULL REFERENCES scan_manifest(manifest_id) ON DELETE CASCADE,
started_at TIMESTAMPTZ NOT NULL,
finished_at TIMESTAMPTZ NOT NULL,
t_ingest_ms INT NOT NULL,
t_analyze_ms INT NOT NULL,
t_reachability_ms INT NOT NULL,
t_vex_ms INT NOT NULL,
t_sign_ms INT NOT NULL,
t_publish_ms INT NOT NULL,
proof_bundle_sha256 TEXT NOT NULL,
findings_sha256 TEXT NOT NULL,
vex_bundle_sha256 TEXT NOT NULL,
replay_mode BOOLEAN NOT NULL DEFAULT FALSE
);
Derived view for Time-to-Evidence:
CREATE VIEW scan_tte AS
SELECT
execution_id,
manifest_id,
(finished_at - started_at) AS tte_interval
FROM scan_execution;
1.3 Table: classification_history
Used for FN-Drift tracking.
CREATE TABLE classification_history (
id BIGSERIAL PRIMARY KEY,
artifact_digest TEXT NOT NULL,
manifest_id UUID NOT NULL REFERENCES scan_manifest(manifest_id) ON DELETE CASCADE,
execution_id UUID NOT NULL REFERENCES scan_execution(execution_id) ON DELETE CASCADE,
previous_status TEXT NOT NULL, -- unaffected | unknown | affected
new_status TEXT NOT NULL,
cause TEXT NOT NULL, -- engine_delta | feed_delta | ruleset_delta | policy_delta
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Materialized view for drift statistics:
CREATE MATERIALIZED VIEW fn_drift_stats AS
SELECT
date_trunc('day', changed_at) AS day_bucket,
COUNT(*) FILTER (WHERE new_status = 'affected') AS affected_count,
COUNT(*) AS total_reclassified,
ROUND(
(COUNT(*) FILTER (WHERE new_status = 'affected')::numeric /
NULLIF(COUNT(*), 0)) * 100, 4
) AS drift_percent
FROM classification_history
GROUP BY 1;
2. .NET 10 / C# Types (Deterministic, Hash-Stable)
The following structs map 1:1 to the DB entities and enforce canonicalization rules.
2.1 CSM Structure
public sealed record CanonicalScanManifest
{
public required string ArtifactDigest { get; init; }
public required string FeedsMerkleRoot { get; init; }
public required string EngineBuildHash { get; init; }
public required string PolicyLatticeHash { get; init; }
public required string RulesetHash { get; init; }
public required IReadOnlyDictionary<string, string> ConfigFlags { get; init; }
public required EnvironmentFingerprint Environment { get; init; }
}
public sealed record EnvironmentFingerprint
{
public required string CpuModel { get; init; }
public required string RuntimeVersion { get; init; }
public required string Os { get; init; }
public required IReadOnlyDictionary<string, string> Extra { get; init; }
}
Deterministic canonical-JSON serializer
Your developers must generate a stable JSON:
internal static class CanonicalJson
{
private static readonly JsonSerializerOptions Options = new()
{
WriteIndented = false,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};
public static string Serialize(object obj)
{
using var stream = new MemoryStream();
using (var writer = new Utf8JsonWriter(stream, new JsonWriterOptions
{
Indented = false,
SkipValidation = false
}))
{
JsonSerializer.Serialize(writer, obj, obj.GetType(), Options);
}
var bytes = stream.ToArray();
// Sort object keys alphabetically and array items in stable order.
// This step is mandatory to guarantee canonical form:
var canonical = JsonCanonicalizer.Canonicalize(bytes);
return canonical;
}
}
JsonCanonicalizer is your deterministic canonicalization engine (already referenced in other Stella Ops modules).
2.2 Execution record
public sealed record ScanExecutionMetrics
{
public required int IngestMs { get; init; }
public required int AnalyzeMs { get; init; }
public required int ReachabilityMs { get; init; }
public required int VexMs { get; init; }
public required int SignMs { get; init; }
public required int PublishMs { get; init; }
}
2.3 Replay harness entrypoint
public static class ReplayRunner
{
public static ReplayResult Replay(Guid manifestId, IScannerEngine engine)
{
var manifest = ManifestRepository.Load(manifestId);
var canonical = CanonicalJson.Serialize(manifest.RawObject);
var canonicalHash = Sha256(canonical);
if (canonicalHash != manifest.RawManifestSHA256)
throw new InvalidOperationException("Manifest integrity violation.");
using var feeds = FeedSnapshotResolver.Open(manifest.FeedsMerkleRoot);
var exec = engine.Scan(new ScanRequest
{
ArtifactDigest = manifest.ArtifactDigest,
Feeds = feeds,
LatticeHash = manifest.PolicyLatticeHash,
EngineBuildHash = manifest.EngineBuildHash,
CanonicalManifest = canonical
});
return new ReplayResult(
exec.FindingsHash == manifest.FindingsSHA256,
exec.VexBundleHash == manifest.VexBundleSHA256,
exec.ProofBundleHash == manifest.ProofBundleSHA256,
exec
);
}
}
Replay must run with:
- no network
- feeds resolved strictly from snapshots
- deterministic clock (monotonic timers only)
3. GitLab CI Job for Nightly Deterministic Replay
replay-test:
stage: test
image: mcr.microsoft.com/dotnet/sdk:10.0
script:
- echo "Starting nightly deterministic replay"
# 1. Export 200 random manifests from Postgres
- >
psql "$PG_CONN" -Atc "
SELECT manifest_id
FROM scan_manifest
ORDER BY random()
LIMIT 200
" > manifests.txt
# 2. Replay each manifest
- >
while read mid; do
echo "Replaying $mid"
dotnet run --project src/StellaOps.Scanner.Replay \
--manifest $mid || exit 1
done < manifests.txt
# 3. Aggregate results
- >
if grep -R "NON-DETERMINISTIC" replay-logs; then
echo "Replay failures detected"
exit 1
else
echo "All replays deterministic"
fi
artifacts:
paths:
- replay-logs/
expire_in: 7 days
only:
- schedules
Replay job failure criteria:
- Any mismatch in findings/VEX/proof bundle hash
- Any non-canonical input or manifest discrepancy
- Any accidental feed/network access
4. Developer Rules (Should be added to docs/stellaops-developer-rules.md)
- A scan is not valid unless the Canonical Scan Manifest (CSM) hash is stored.
- Every stage must emit monotonic timestamps for TTE. Do not mix monotonic and wall clock.
- Classification changes must always include a cause: no silent reclassification.
- Replay mode must never reach network, dynamic rules, cloud feeds, or external clocks.
- Proof bundles must be TAR with deterministic ordering: alphabetical filenames, fixed uid/gid=0, fixed mtime=0.
5. Ready for integration
If you want, I can produce:
- the full EF Core 9 mapping classes
- a migration file consistent with your existing Stella Ops module naming
- the Angular UI block that displays TTE, FN-Drift, and Replay statistics
- a deterministic TAR writer (C#) for proof bundles
Tell me which part you want next.