Files
git.stella-ops.org/docs/product-advisories/09-Dec-2025 - Caching Reachability the Smart Way.md
2025-12-09 20:23:50 +02:00

20 KiB
Raw Blame History

Heres a compact pattern you can drop into StellaOps to make reachability checks fast, reproducible, and auditfriendly.


Lazy, singleuse reachability cache + signed “reachmap” artifacts

Why: reachability queries explode combinatorially; precomputing everything wastes RAM and goes stale. Cache results only when first asked, make them deterministic, and emit a signed artifact so the same evidence can be replayed in VEX proofs.

Core ideas (plain English):

  • Lazy on first call: compute only the exact path/query requested; cache that result.
  • Deterministic key: cache key = algo_signature + inputs_hash + call_path_hash so the same inputs always hit the same entry.
  • Singleuse / bounded TTL: entries survive just long enough to serve concurrent deduped calls, then get evicted (or on TTL/size). This keeps memory tight and avoids stale proofs.
  • Reachmap artifact: every cache fill writes a compact, deterministic JSON “reachmap” (edges, justifications, versions, timestamps) and signs it (DSSE). The artifact is what VEX cites, not volatile memory.
  • Replayable proofs: later runs can skip recomputation by verifying + loading the reachmap, yielding byteforbyte identical evidence.

Minimal shape (C#/.NET 10):

public readonly record struct ReachKey(
    string AlgoSig, // e.g., "RTA@sha256:…"
    string InputsHash, // SBOM slice + policy + versions
    string CallPathHash // normalized query graph (src->sink, opts)
);

public sealed class ReachCache {
    private readonly ConcurrentDictionary<ReachKey, Lazy<Task<ReachResult>>> _memo = new();

    public Task<ReachResult> GetOrComputeAsync(
        ReachKey key,
        Func<Task<ReachResult>> compute,
        CancellationToken ct)
    {
        var lazy = _memo.GetOrAdd(key, _ => new Lazy<Task<ReachResult>>(
            () => compute(), LazyThreadSafetyMode.ExecutionAndPublication));

        return lazy.Value.ContinueWith(t => {
            if (t.IsCompletedSuccessfully) return t.Result;
            _memo.TryRemove(key, out _); // dont retain failures
            throw t.Exception ?? new Exception("reachability failed");
        }, ct);
    }

    public void Evict(ReachKey key) => _memo.TryRemove(key, out _);
}

Compute path → emit DSSE reachmap (pseudocode):

var result = await cache.GetOrComputeAsync(key, async () => {
    var graph = BuildSlice(inputs);             // deterministic ordering!
    var paths = FindReachable(graph, query);    // your chosen algo
    var reachMap = Canonicalize(new {
        algo = key.AlgoSig,
        inputs_hash = key.InputsHash,
        call_path = key.CallPathHash,
        edges = paths.Edges,
        witnesses = paths.Witnesses, // file:line, symbol ids, versions
        created = NowUtcIso8601()
    });
    var dsse = Dsse.Sign(reachMap, signingKey); // e.g., intoto/DSSE
    await ArtifactStore.PutAsync(KeyToPath(key), dsse.Bytes);
    return new ReachResult(paths, dsse.Digest);
}, ct);

Operational rules:

  • Canonical everything: sort nodes/edges, normalize file paths, strip nondeterministic fields.
  • Cache scope: perscan, perworkspace, or perfeed version. Evict on feed/policy changes.
  • TTL: e.g., 1560 minutes; or evict after pipeline completes. Guard with a maxentries cap.
  • Concurrency: use Lazy<Task<…>> (above) to coalesce duplicate inflight calls.
  • Validation path: before computing, look for reach-map.dsse by ReachKey; if signature verifies and schema version matches, load and return (no compute).

How this helps VEX in StellaOps:

  • Consistency: the DSSE reachmap is the evidence blob your VEX record links to.
  • Speed: repeat scans and parallel microservices reuse cached or presigned artifacts.
  • Memory safety: no unbounded precompute; everything is small, querydriven.

Dropin tasks for your agents:

  1. Define ReachKey builders in Scanner.WebService (inputs hash = SBOM slice + policy + resolver versions).
  2. Add ReachCache as a scoped service with size/TTL config (appsettings → Scanner.Reach.Cache).
  3. Implement Canonicalize + Dsse.Sign in StellaOps.Crypto (support FIPS/eIDAS/GOST modes).
  4. ArtifactStore: write/read reach-map.dsse.json under deterministic path: artifacts/reach/<algo>/<inputsHash>/<callPathHash>.dsse.json.
  5. Wire VEXer to reference the artifact digest and include a verification note.
  6. Tests: golden fixtures asserting stable bytes for the same inputs; mutation tests to ensure any input change invalidates the cache key.

If you want, I can turn this into a readytocommit StellaOps.Scanner.Reach module (interfaces, options, tests, and a stub DSSE signer). I will split this in two parts:

  1. What are Stella Ops concrete advantages (the “moats”).
  2. How developers must build to actually realize them (guidelines and checklists).

1. Stella Ops Advantages What We Are Optimizing For

1.1 Deterministic, Replayable Security Evidence

Idea: Any scan or VEX decision run today must be replayable bit-for-bit in 35 years for audits, disputes, and compliance.

What this means:

  • Every scan has an explicit input manifest (feeds, rules, policies, versions, timestamps).
  • Outputs (findings, reachability, VEX, attestations) are pure functions of that manifest.
  • Evidence is stored as immutable artifacts (DSSE, SBOMs, reach-maps, policy snapshots), not just rows in a DB.

1.2 Reachability-First, Quiet-By-Design Triage

Idea: The main value is not “finding more CVEs” but proving which ones matter in your actual runtime and call graph and keeping noise down.

What this means:

  • Scoring/prioritization is dominated by reachability + runtime context, not just CVSS.
  • Unknowns and partial evidence are surfaced explicitly, not hidden.
  • UX is intentionally quiet: “Can I ship?” → “Yes / No, because of these N concrete, reachable issues.”

1.3 Crypto-Sovereign, Air-Gap-Ready Trust

Idea: The platform must run offline, support local CAs/HSMs, and switch between cryptographic regimes (FIPS, eIDAS, GOST, SM, PQC) by configuration, not by code changes.

What this means:

  • No hard dependency on any public CA, cloud KMS, or single trust provider.
  • All attestations are locally verifiable with bundled roots and policies.
  • Crypto suites are pluggable profiles selected per deployment / tenant.

1.4 Policy / Lattice Engine (“Trust Algebra Studio”)

Idea: Vendors, customers, and regulators speak different languages. Stella Ops provides a formal lattice to merge and reason over:

  • VEX statements
  • Runtime observations
  • Code provenance
  • Organizational policies

…without losing provenance (“who said what”).

What this means:

  • Clear separation between facts (observations) and policies (how we rank/merge them).
  • Lattice merge operations are explicit, testable functions, not hidden heuristics.
  • Same artifact can be interpreted differently by different tenants via different lattice policies.

1.5 Proof-Linked SBOM→VEX Chain

Idea: Every VEX claim must point to concrete, verifiable evidence:

  • Which SBOM / version?
  • Which reachability analysis?
  • Which runtime signals?
  • Which signer/policy?

What this means:

  • VEX is not just a JSON document it is a graph of links to attestations and analysis artifacts.
  • You can click from a VEX statement to the exact DSSE reach-map / scan run that justified it.

1.6 Proof-of-Integrity Graph (Build → Image → Runtime)

Idea: Connect:

  • Source → Build → Image → SBOM → Scan → VEX → Runtime

…into a single cryptographically verifiable graph.

What this means:

  • Every step has a signed attestation (in-toto/DSSE style).
  • Graph queries like “Show me all running pods that descend from this compromised builder” or “Show me all VEX statements that rely on this revoked key” are first-class.

1.7 AI Codex & Zastava Companion (Explainable by Construction)

Idea: AI is used only as a narrator and planner on top of hard evidence, not as an oracle.

What this means:

  • Zastava never invents facts; it explains what is already in the evidence graph.
  • Remediation plans cite concrete artifacts (scan IDs, attestations, policies) and affected assets.
  • All AI outputs include links back to raw structured data and can be re-generated in future with the same evidence set.

1.8 Proof-Market Ledger & Adaptive Trust Economics

Idea: Over time, vendors publishing good SBOM/VEX evidence should gain trust-credit; sloppy or contradictory publishers lose it.

What this means:

  • A ledger of published proofs, signatures, and revocations.
  • A trust score per artifact / signer / vendor, derived from consistency, coverage, and historical correctness.
  • This feeds into procurement and risk dashboards, not just security triage.

2. Developer Guidelines How to Build for These Advantages

I will phrase this as rules and checklists you can directly apply in Stella Ops repos (.NET 10, C#, Postgres, MongoDB, etc.).


2.1 Determinism & Replayability

Rules:

  1. Pure functions, explicit manifests

    • Any long-running or non-trivial computation (scan, reachability, lattice merge, trust score) must accept a single, structured input manifest, e.g.:

      {
        "scannerVersion": "1.3.0",
        "rulesetId": "stella-default-2025.11",
        "feeds": {
          "nvdDigest": "sha256:...",
          "osvDigest": "sha256:..."
        },
        "sbomDigest": "sha256:...",
        "policyDigest": "sha256:..."
      }
      
    • No hidden configuration from environment variables, machine-local files, or system clock inside the core algorithm.

  2. Canonicalization everywhere

    • Before hashing or signing:

      • Sort arrays by stable keys.
      • Normalize paths (POSIX style), line endings (LF), and encodings (UTF-8).
    • Provide a shared StellaOps.Core.Canonicalization library used by all services.

  3. Stable IDs

    • Every scan, reachability call, lattice evaluation, and VEX bundle gets an opaque but stable ID based on the input manifest hash.
    • Do not use incremental integer IDs for evidence; use digests (hashes) or ULIDs/GUIDs derived from content.
  4. Golden fixtures

    • For each non-trivial algorithm, ship at least one golden fixture:

      • Input manifest JSON
      • Expected output JSON
    • CI must assert byte-for-byte equality for these fixtures (after canonicalization).

Developer checklist (per feature):

  • Input manifest type defined and versioned.
  • Canonicalization applied before hashing/signing.
  • Output stored with inputsDigest and algoDigest.
  • At least one golden fixture proves determinism.

2.2 Reachability-First Analysis & Quiet UX

Rules:

  1. Reachability lives in Scanner.WebService

    • All lattice/graph heavy lifting for reachability must run in Scanner.WebService (standing architectural rule).
    • Other services (Concelier, Excitors, Feedser) only consume reachability artifacts and must “preserve prune source” (never rewrite paths/proofs, only annotate or filter).
  2. Lazy, query-driven computation

    • Do not precompute reachability for entire SBOMs.

    • Compute per exact query (image + vulnerability or source→sink path).

    • Use an in-memory or short-lived cache keyed by:

      • Algorithm signature
      • Input manifest hash
      • Query description (call-path hash)
  3. Evidence-first, severity-second

    • Internal ranking objects should look like:

      public sealed record FindingRank(
          string FindingId,
          EvidencePointer Evidence,
          ReachabilityScore Reach,
          ExploitStatus Exploit,
          RuntimePresence Runtime,
          double FinalScore);
      
    • UI always has a “Show evidence” or “Explain” action that can be serialized as JSON and re-used by Zastava.

  4. Quiet-by-design UX

    • For any list view, default sort is:

      1. Reachable, exploitable, runtime-present
      2. Reachable, exploitable
      3. Reachable, unknown exploit
      4. Unreachable
    • Show counts by bucket, not only total CVE count.

Developer checklist:

  • Reachability algorithms only in Scanner.WebService.
  • Cache is lazy and keyed by deterministic inputs.
  • Output includes explicit evidence pointers.
  • UI endpoints expose reachability state in structured form.

2.3 Crypto-Sovereign & Air-Gap Mode

Rules:

  1. Cryptography via “profiles”

    • Implement a CryptoProfile abstraction (e.g. FipsProfile, GostProfile, EidasProfile, SmProfile, PqcProfile).
    • All signing/verifying APIs take a CryptoProfile or resolve one from tenant config; no direct calls to RSA.Create() etc. in business code.
  2. No hard dependency on public PKI

    • All verification logic must accept:

      • Provided root cert bundle
      • Local CRL or OCSP-equivalent
    • Never assume internet OCSP/CRL.

  3. Offline bundles

    • Any operation required for air-gapped mode must be satisfiable with:

      • SBOM + feeds + policy bundle + key material
    • Define explicit “offline bundle” formats (zip/tar + manifest) with hashes of all contents.

  4. Key rotation and algorithm agility

    • Metadata for every signature must record:

      • Algorithm
      • Key ID
      • Profile
    • Verification code must fail safely when a profile is disabled, and error messages must be precise.

Developer checklist:

  • No direct crypto calls in feature code; only via profile layer.
  • All attestations carry algorithm + key id + profile.
  • Offline bundle type exists for this workflow.
  • Tests for at least 2 different crypto profiles.

2.4 Policy / Lattice Engine

Rules:

  1. Facts vs. Policies separation

    • Facts:

      • SBOM components, CVEs, reachability edges, runtime signals.
    • Policies:

      • “If vendor says not affected and reachability says unreachable, treat as Informational.”
    • Serialize facts and policies separately, with their own digests.

  2. Lattice implementation location

    • Lattice evaluation (trust algebra) for VEX decisions happens in:

      • Scanner.WebService for scan-time interpretation
      • Vexer/Excitor for publishing and transformation into VEX documents
    • Concelier/Feedser must not recompute lattice results, only read them.

  3. Formal merge operations

    • Each lattice merge function must be:

      • Explicitly named (e.g. MaxSeverity, VendorOverridesIfStrongerEvidence, ConservativeIntersection).
      • Versioned and referenced by ID in artifacts (e.g. latticeAlgo: "trust-algebra/v1/max-severity").
  4. Studio-ready representation

    • Internal data structures must align with a future “Trust Algebra Studio” UI:

      • Nodes = statements (VEX, runtime observation, reachability result)
      • Edges = “derived_from” / “overrides” / “constraints”
      • Policies = transformations over these graphs.

Developer checklist:

  • Facts and policies are serialized separately.
  • Lattice code is in allowed services only.
  • Merge strategies are named and versioned.
  • Artifacts record which lattice algorithm was used.

2.5 Proof-Linked SBOM→VEX Chain

Rules:

  1. Link, dont merge

    • SBOM, scan result, reachability artifact, and VEX should keep their own schemas.
    • Use linking IDs instead of denormalizing everything into one mega-document.
  2. Evidence pointers in VEX

    • Every VEX statement (per vuln/component) includes:

      • sbomDigest
      • scanId
      • reachMapDigest
      • policyDigest
      • signerKeyId
  3. DSSE everywhere

    • All analysis artifacts are wrapped in DSSE:

      • Payload = canonical JSON
      • Envelope = signature + key metadata + profile
    • Do not invent yet another custom envelope format.

Developer checklist:

  • VEX schema includes pointers back to all upstream artifacts.
  • No duplication of SBOM or scan content inside VEX.
  • DSSE used as standard envelope type.

2.6 Proof-of-Integrity Graph

Rules:

  1. Graph-first storage model

    • Model the lifecycle as a graph:

      • Nodes: source commit, build, image, SBOM, scan, VEX, runtime instance.
      • Edges: “built_from”, “scanned_as”, “deployed_as”, “derived_from”.
    • Use stable IDs and store in a graph-friendly form (e.g. adjacency collections in Postgres or document graph in Mongo).

  2. Attestations as edges

    • Attestations represent edges, not just metadata blobs.
    • Example: a build attestation is an edge: commit -> image, signed by the CI builder.
  3. Queryable from APIs

    • Expose API endpoints like:

      • GET /graph/runtime/{podId}/lineage
      • GET /graph/image/{digest}/vex
    • Zastava and the UI must use the same APIs, not private shortcuts.

Developer checklist:

  • Graph nodes and edges modelled explicitly.
  • Each edge type has an attestation schema.
  • At least two graph traversal APIs implemented.

2.7 AI Codex & Zastava Companion

Rules:

  1. Evidence in, explanation out

    • Zastava must receive:

      • Explicit evidence bundle (JSON) for a question.
      • The users question.
    • It must not be responsible for data retrieval or correlation itself that is the platforms job.

  2. Stable explanation contracts

    • Define a structured response format, for example:

      {
        "shortAnswer": "You can ship, with 1 reachable critical.",
        "findingsSummary": [...],
        "remediationPlan": [...],
        "evidencePointers": [...]
      }
      
    • This allows regeneration and multi-language rendering later.

  3. No silent decisions

    • Every recommendation must include:

      • Which lattice policy was assumed.
      • Which artifacts were used (by ID).

Developer checklist:

  • Zastava APIs accept evidence bundles, not query strings against the DB.
  • Responses are structured and deterministic given the evidence.
  • Explanations include policy and artifact references.

2.8 Proof-Market Ledger & Adaptive Trust

Rules:

  1. Ledger as append-only

    • Treat proof-market ledger as an append-only log:

      • New proofs (SBOM/VEX/attestations)
      • Revocations
      • Corrections / contradictions
    • Do not delete; instead emit revocation events.

  2. Trust-score derivation

    • Trust is not a free-form label; it is a numeric or lattice value computed from:

      • Number of consistent proofs over time.
      • Speed of publishing after CVE.
      • Rate of contradictions or revocations.
  3. Separation from security decisions

    • Trust scores feed into:

      • Sorting and highlighting.
      • Procurement / vendor dashboards.
    • Do not hard-gate security decisions solely on trust scores.

Developer checklist:

  • Ledger is append-only with explicit revocations.
  • Trust scoring algorithm documented and versioned.
  • UI uses trust scores only as a dimension, not a gate.

2.9 Quantum-Resilient Mode

Rules:

  1. Optional PQC

    • PQC algorithms (e.g. Dilithium, Falcon) are an opt-in crypto profile.
    • Artifacts can carry multiple signatures (classical + PQC) to ease migration.
  2. No PQC assumption in core logic

    • Core logic must treat algorithm as opaque; only crypto layer understands whether it is PQ or classical.

Developer checklist:

  • PQC profile implemented as a first-class profile.
  • Artifacts support multi-signature envelopes.

3. Definition of Done Templates

You can use this as a per-feature DoD in Stella Ops:

For any new feature that touches scans, VEX, or evidence:

  • Deterministic: input manifest defined, canonicalization applied, golden fixture(s) added.
  • Evidence: outputs are DSSE-wrapped and linked (not merged) into existing artifacts.
  • Reachability / Lattice: if applicable, runs only in allowed services and records algorithm IDs.
  • Crypto: crypto calls go through profile abstraction; tests for at least 2 profiles if security-sensitive.
  • Graph: lineage edges added where appropriate; node/edge IDs stable and queryable.
  • UX/API: at least one API to retrieve structured evidence for Zastava and UI.
  • Tests: unit + golden + at least one integration test with a full SBOM → scan → VEX chain.

If you want, next step can be to pick one module (e.g. Scanner.WebService or Vexer) and turn these high-level rules into a concrete CONTRIBUTING.md / ARCHITECTURE.md for that service.