Below are internal guidelines for Stella Ops Product Managers and Development Managers for the capability: **Knowledge Snapshots / Time‑Travel Replay**. This is written as an implementable operating standard (not a concept note). --- # Knowledge Snapshots / Time‑Travel Replay ## Product and Engineering Guidelines for Stella Ops ## 1) Purpose and value proposition ### What this capability must achieve Enable Stella Ops to **reproduce any historical risk decision** (scan result, policy evaluation, verdict) **deterministically**, using a **cryptographically bound snapshot** of the exact knowledge inputs that were available at the time the decision was made. ### Why customers pay for it This capability is primarily purchased for: * **Auditability**: “Show me what you knew, when you knew it, and why the system decided pass/fail.” * **Incident response**: reproduce prior posture using historical feeds/VEX/policies and explain deltas. * **Air‑gapped / regulated environments**: deterministic, offline decisioning with attested knowledge state. * **Change control**: prove whether a decision changed due to code change vs knowledge change. ### Core product promise For a given artifact and snapshot: * **Same inputs → same outputs** (verdict, scores, findings, evidence pointers), or Stella Ops must clearly declare the precise exceptions. --- ## 2) Definitions (PMs and engineers must align on these) ### Knowledge input Any external or semi-external information that can influence the outcome: * vulnerability databases and advisories (any source) * exploit-intel signals * VEX statements (OpenVEX, CSAF, CycloneDX VEX, etc.) * SBOM ingestion logic and parsing rules * package identification rules (including distro/backport logic) * policy content and policy engine version * scoring rules (including weights and thresholds) * trust anchors and signature verification policy * plugin versions and enabled capabilities * configuration defaults and overrides that change analysis ### Knowledge Snapshot A **sealed record** of: 1. **References** (which inputs were used), and 2. **Content** (the exact bytes used), and 3. **Execution contract** (the evaluator and ruleset versions) ### Time‑Travel Replay Re-running evaluation of an artifact **using only** the snapshot content and the recorded execution contract, producing the same decision and explainability artifacts. --- ## 3) Product principles (non‑negotiables) 1. **Determinism is a product requirement**, not an engineering detail. 2. **Snapshots are first‑class artifacts** with explicit lifecycle (create, verify, export/import, retain, expire). 3. **The snapshot is cryptographically bound** to outcomes and evidence (tamper-evident chain). 4. **Replays must be possible offline** (when the snapshot includes content) and must fail clearly when not possible. 5. **Minimal surprise**: the UI must explain when a verdict changed due to “knowledge drift” vs “artifact drift.” 6. **Scalability by content addressing**: the platform must deduplicate knowledge content aggressively. 7. **Backward compatibility**: old snapshots must remain replayable within a documented support window. --- ## 4) Scope boundaries (what this is not) ### Non-goals (explicitly out of scope for v1 unless approved) * Reconstructing *external internet state* beyond what is recorded (no “fetch historical CVE state from the web”). * Guaranteeing replay across major engine rewrites without a compatibility plan. * Storing sensitive proprietary customer code in snapshots (unless explicitly enabled). * Replaying “live runtime signals” unless those signals were captured into the snapshot at decision time. --- ## 5) Personas and use cases (PM guidance) ### Primary personas * **Security Governance / GRC**: needs audit packs, controls evidence, deterministic history. * **Incident response / AppSec lead**: needs “what changed and why” quickly. * **Platform engineering / DevOps**: needs reproducible CI gates and air‑gap workflows. * **Procurement / regulated customers**: needs proof of process and defensible attestations. ### Must-support use cases 1. **Replay a past release gate decision** in a new environment (including offline) and get identical outcome. 2. **Explain drift**: “This build fails today but passed last month—why?” 3. **Air‑gap export/import**: create snapshots in connected environment, import to disconnected one. 4. **Audit bundle generation**: export snapshot + verdict(s) + evidence pointers. --- ## 6) Functional requirements (PM “must/should” list) ### Must * **Snapshot creation** for every material evaluation (or for every “decision object” chosen by configuration). * **Snapshot manifest** containing: * unique snapshot ID (content-addressed) * list of knowledge sources with hashes/digests * policy IDs and exact policy content hashes * engine version and plugin versions * timestamp and clock source metadata * trust anchor set hash and verification policy hash * **Snapshot sealing**: * snapshot manifest is signed * signed link from verdict → snapshot ID * **Replay**: * re-evaluate using only snapshot inputs * output must match prior results (or emit a deterministic mismatch report) * **Export/import**: * portable bundle format * import verifies integrity and signatures before allowing use * **Retention controls**: * configurable retention windows and storage quotas * deduplication and garbage collection ### Should * **Partial snapshots** (reference-only) vs **full snapshots** (content included), with explicit replay guarantees. * **Diff views**: compare two snapshots and highlight what knowledge changed. * **Multi-snapshot replay**: run “as-of snapshot A” and “as-of snapshot B” to show drift impact. ### Could * Snapshot “federation” for large orgs (mirrors/replication with policy controls). * Snapshot “pinning” to releases or environments as a governance policy. --- ## 7) UX and workflow guidelines (PM + Eng) ### UI must communicate three states clearly 1. **Reproducible offline**: snapshot includes all required content. 2. **Reproducible with access**: snapshot references external sources that must be available. 3. **Not reproducible**: missing content or unsupported evaluator version. ### Required UI objects * **Snapshot Details page** * snapshot ID and signature status * list of knowledge sources (name, version/epoch, digest, size) * policy bundle version, scoring rules version * trust anchors + verification policy digest * replay status: “verified reproducible / reproducible / not reproducible” * **Verdict page** * links to snapshot(s) * “replay now” action * “compare to latest knowledge” action ### UX guardrails * Never show “pass/fail” without also showing: * snapshot ID * policy ID/version * verification status * When results differ on replay, show: * exact mismatch class (engine mismatch, missing data, nondeterminism, corrupted snapshot) * what input changed (if known) * remediation steps --- ## 8) Data model and format guidelines (Development Managers) ### Canonical objects (recommended minimum set) * **KnowledgeSnapshotManifest (KSM)** * **KnowledgeBlob** (content-addressed bytes) * **KnowledgeSourceDescriptor** * **PolicyBundle** * **TrustBundle** * **Verdict** (signed decision artifact) * **ReplayReport** (records replay result and mismatches) ### Content addressing * Use a stable hash (e.g., SHA‑256) for: * each knowledge blob * manifest * policy bundle * trust bundle * Snapshot ID should be derived from manifest digest. ### Example manifest shape (illustrative) ```json { "snapshot_id": "ksm:sha256:…", "created_at": "2025-12-19T10:15:30Z", "engine": { "name": "stella-evaluator", "version": "1.7.0", "build": "…"}, "plugins": [ { "name": "pkg-id", "version": "2.3.1", "digest": "sha256:…" } ], "policy": { "bundle_id": "pol:sha256:…", "digest": "sha256:…" }, "scoring": { "ruleset_id": "score:sha256:…", "digest": "sha256:…" }, "trust": { "bundle_id": "trust:sha256:…", "digest": "sha256:…" }, "sources": [ { "name": "nvd", "epoch": "2025-12-18", "kind": "vuln_feed", "content_digest": "sha256:…", "licenses": ["…"], "origin": { "uri": "…", "retrieved_at": "…" } }, { "name": "customer-vex", "kind": "vex", "content_digest": "sha256:…" } ], "environment": { "determinism_profile": "strict", "timezone": "UTC", "normalization": { "line_endings": "LF", "sort_order": "canonical" } } } ``` ### Versioning rules * Every object is immutable once written. * Changes create new digests; never mutate in place. * Support schema evolution via: * `schema_version` * strict validation + migration tooling * Keep manifests small; store large data as blobs. --- ## 9) Determinism contract (Engineering must enforce) ### Determinism requirements * Stable ordering: sort inputs and outputs canonically. * Stable timestamps: timestamps may exist but must not change computed scores/verdict. * Stable randomization: no RNG; if unavoidable, fixed seed recorded in snapshot. * Stable parsers: parser versions are pinned by digest; parsing must be deterministic. ### Allowed nondeterminism (if any) must be explicit If you must allow nondeterminism, it must be: * documented, * surfaced in UI, * included in replay report as “non-deterministic factor,” * and excluded from the signed decision if it affects pass/fail. --- ## 10) Security model (Development Managers) ### Threats this feature must address * Feed poisoning (tampered vulnerability data) * Time-of-check/time-of-use drift (same artifact evaluated against moving feeds) * Replay manipulation (swap snapshot content) * “Policy drift hiding” (claiming old decision used different policies) * Signature bypass (trust anchors altered) ### Controls required * Sign manifests and verdicts. * Bind verdict → snapshot ID → policy bundle hash → trust bundle hash. * Verify on every import and on every replay invocation. * Audit log: * snapshot created * snapshot imported * replay executed * verification failures ### Key handling * Decide and document: * who signs snapshots/verdicts (service keys vs tenant keys) * rotation policy * revocation/compromise handling * Avoid designing cryptography from scratch; use well-established signing formats and separation of duties. --- ## 11) Offline / air‑gapped requirements ### Snapshot levels (PM packaging guideline) Offer explicit snapshot types with clear guarantees: * **Level A: Reference-only snapshot** * stores hashes + source descriptors * replay requires access to original sources * **Level B: Portable snapshot** * includes blobs necessary for replay * replay works offline * **Level C: Sealed portable snapshot** * portable + signed + includes trust anchors * replay works offline and can be verified independently Do not market air‑gap support without specifying which level is provided. --- ## 12) Performance and storage guidelines ### Principles * Content-address knowledge blobs to maximize deduplication. * Separate “hot” knowledge (recent epochs) from cold storage. * Support snapshot compaction and garbage collection. ### Operational requirements * Retention policies per tenant/project/environment. * Quotas and alerting when snapshot storage approaches limits. * Export bundles should be chunked/streamable for large feeds. --- ## 13) Testing and acceptance criteria ### Required test categories 1. **Golden replay tests** * same artifact + same snapshot → identical outputs 2. **Corruption tests** * bit flips in blobs/manifests are detected and rejected 3. **Version skew tests** * old snapshot + new engine should either replay deterministically or fail with a clear incompatibility report 4. **Air‑gap tests** * export → import → replay without network access 5. **Diff accuracy tests** * compare snapshots and ensure the diff identifies actual knowledge changes, not noise ### Definition of Done (DoD) for the feature * Snapshots are created automatically according to policy. * Snapshots can be exported and imported with verified integrity. * Replay produces matching verdicts for a representative corpus. * UI exposes snapshot provenance and replay status. * Audit log records snapshot lifecycle events. * Clear failure modes exist (missing blobs, incompatible engine, signature failure). --- ## 14) Metrics (PM ownership) Track metrics that prove this is a moat, not a checkbox. ### Core KPIs * **Replay success rate** (strict determinism) * **Time to explain drift** (median time from “why changed” to root cause) * **% verdicts with sealed portable snapshots** * **Audit effort reduction** (customer-reported or measured via workflow steps) * **Storage efficiency** (dedup ratio; bytes per snapshot over time) ### Guardrail metrics * Snapshot creation latency impact on CI * Snapshot storage growth per tenant * Verification failure rates --- ## 15) Common failure modes (what to prevent) 1. Treating snapshots as “metadata only” and still claiming replayability. 2. Allowing “latest feed fetch” during replay (breaks the promise). 3. Not pinning parser/policy/scoring versions—causes silent drift. 4. Missing clear UX around replay limitations and failure reasons. 5. Overcapturing sensitive inputs (privacy and customer trust risk). 6. Underinvesting in dedup/retention (cost blowups). --- ## 16) Management checklists ### PM checklist (before commitment) * Precisely define “replay” guarantee level (A/B/C) for each SKU/environment. * Define which inputs are in scope (feeds, VEX, policies, trust bundles, plugins). * Define customer-facing workflows: * “replay now” * “compare to latest” * “export for audit / air-gap” * Confirm governance outcomes: * audit pack integration * exception linkage * release gate linkage ### Development Manager checklist (before build) * Establish canonical schemas and versioning plan. * Establish content-addressed storage + dedup plan. * Establish signing and trust anchor strategy. * Establish deterministic evaluation contract and test harness. * Establish import/export packaging and verification. * Establish retention, quotas, and GC. --- ## 17) Minimal phased delivery (recommended) **Phase 1: Reference snapshot + verdict binding** * Record source descriptors + hashes, policy/scoring/trust digests. * Bind snapshot ID into verdict artifacts. **Phase 2: Portable snapshots** * Store knowledge blobs locally with dedup. * Export/import with integrity verification. **Phase 3: Sealed portable snapshots + replay tooling** * Sign snapshots. * Deterministic replay pipeline + replay report. * UI surfacing and audit logs. **Phase 4: Snapshot diff + drift explainability** * Compare snapshots. * Attribute decision drift to knowledge changes vs artifact changes. --- If you want this turned into an internal PRD template, I can rewrite it into a structured PRD format with: objectives, user stories, functional requirements, non-functional requirements, security/compliance, dependencies, risks, and acceptance tests—ready for Jira/Linear epics and engineering design review.