docs consolidation and others

This commit is contained in:
master
2026-01-06 19:02:21 +02:00
parent d7bdca6d97
commit 4789027317
849 changed files with 16551 additions and 66770 deletions

View File

@@ -0,0 +1,473 @@
# Stella Ops — Deterministic Replay Specification
Version: 1.0
Status: Draft / Internal Technical Reference
Audience: Core developers, module maintainers, audit engineers.
---
## 1. Purpose
Deterministic Replay allows any completed Stella Ops scan to be **reproduced byte-for-byte** with full cryptographic validation.
It guarantees that SBOMs, Findings, and VEX evaluations can be re-executed later to:
- prove historical compliance decisions,
- attribute changes precisely to feeds, rules, or tools,
- support dual-signing (FIPS + regional crypto),
- and anchor cryptographic evidence in offline or public ledgers.
Replay requires that all inputs and environmental conditions are **captured, hashed, and sealed** at scan time.
---
## 2. Architecture Overview
```mermaid
graph TD
A[Scanner.WebService] --> B[Replay Manifest]
A --> C[InputBundle]
A --> D[OutputBundle]
B --> E[DSSE Envelope]
C --> F[Feedser Snapshot Export]
C --> G[Policy/Lattice Bundle]
D --> H[DSSE Outputs (SBOM, Findings, VEX)]
E --> I[PostgreSQL: replay_runs]
C --> J[Blob Store: Input/Output Bundles]
````
### Core Artifacts
| Artifact | Description | Format |
| ------------------- | ------------------------------------------------------ | -------------------------- |
| **Replay Manifest** | Immutable JSON describing all scan inputs and outputs. | JSON (canonicalized) |
| **InputBundle** | Feeds, rules, policies, tool binaries (hashed). | `.tar.zst` |
| **OutputBundle** | SBOM, Findings, VEX, logs. | `.tar.zst` |
| **DSSE Envelope** | Signed metadata for each artifact. | JSON / JWS |
| **Merkle Map** | Layer and feed chunk trees. | JSON (embedded or sidecar) |
---
## 3. Replay Manifest Schema (v1)
### 3.1 Top-level Layout
```jsonc
{
"schemaVersion": "1.0",
"scan": {
"id": "uuid",
"time": "2025-10-29T13:05:33Z",
"mode": "record",
"scannerVersion": "10.1.3",
"cryptoProfile": "FIPS-140-3+GOST-R-34.10-2012"
},
"subject": {
"ociDigest": "sha256:abcd...",
"layers": [
{ "layerDigest": "...", "merkleRoot": "...", "leafCount": 144 }
]
},
"inputs": {
"feeds": [
{
"name": "nvd",
"snapshotHash": "sha256:...",
"snapshotTime": "2025-10-29T12:00:00Z",
"merkleRoot": "..."
}
],
"rulesBundleHash": "sha256:...",
"tools": [
{ "name": "sbomer", "version": "10.1.3", "sha256": "..." },
{ "name": "scanner", "version": "10.1.3", "sha256": "..." },
{ "name": "vexer", "version": "10.1.3", "sha256": "..." }
],
"env": {
"os": "linux",
"arch": "x64",
"locale": "en_US.UTF-8",
"tz": "UTC",
"seed": "H(scan.id||merkleRootAllLayers)",
"flags": ["offline"]
}
},
"policy": {
"latticeHash": "sha256:...",
"mutes": [
{ "id": "MUTE-1234", "reason": "vendor ack", "approvedBy": "authority@example.com", "approvedAt": "2025-10-29T12:55Z" }
],
"trustProfile": "sha256:..."
},
"outputs": {
"sbomHash": "sha256:...",
"findingsHash": "sha256:...",
"vexHash": "sha256:...",
"logHash": "sha256:..."
},
"reachability": {
"graphs": [
{
"kind": "static",
"analyzer": "scanner/java@sha256:...",
"casUri": "cas://replay/scan-123/reachability/static-graph.tar.zst",
"sha256": "abc123"
},
{
"kind": "framework",
"analyzer": "scanner/framework@sha256:...",
"casUri": "cas://replay/scan-123/reachability/framework-graph.tar.zst",
"sha256": "def456"
}
],
"runtimeTraces": [
{
"source": "zastava",
"casUri": "cas://replay/scan-123/reachability/runtime-trace.ndjson.zst",
"sha256": "feedface",
"recordedAt": "2025-11-07T11:10:00Z"
}
]
},
"provenance": {
"signer": "scanner.authority",
"dsseEnvelopeHash": "sha256:...",
"rekorEntry": "optional"
}
}
```
### 3.2 Reachability Section
The optional `reachability` block captures the inputs needed to replay explainability decisions:
| Field | Description |
|-------|-------------|
| `reachability.graphs[]` | References to static/framework callgraph bundles. Each entry records the producing analyzer (`analyzer`/`version`), the CAS URI under `cas://replay/<scan-id>/reachability/graphs/`, and the SHA-256 digest of the tarball. |
| `reachability.runtimeTraces[]` | References to runtime observation bundles (e.g., Zastava ND-JSON traces). Each item stores the emitting source, CAS URI (typically `cas://replay/<scan-id>/reachability/traces/`), SHA-256, and capture timestamp. |
Replay engines MUST verify every referenced artifact hash before re-evaluating reachability. Missing graphs downgrade affected signals to `reachability:unknown` and should raise policy warnings.
Producer note: default clock values in `StellaOps.Replay.Core` are `UnixEpoch` to avoid hidden time drift; producers MUST set `scan.time` and `reachability.runtimeTraces[].recordedAt` explicitly.
---
## 4. Deterministic Execution Rules
### 4.1 Environment Normalization
* **Clock:** frozen to `scan.time` unless a rule explicitly requires “now”.
* **Random seed:** derived as `H(scan.id || MerkleRootAllLayers)`.
* **Locale/TZ:** enforced per manifest; deviations cause validation error.
* **Filesystem normalization:**
* Normalize perms to 0644/0755.
* Path separators = `/`.
* Newlines = LF.
* JSON key order = lexical.
### 4.2 Concurrency & I/O
* File traversal: stable lexicographic order.
* Parallel jobs: ordered reduction by subject path.
* Temporary directories: ephemeral but deterministic hash seeds.
### 4.3 Feeds & Policies
* All network I/O disabled; feeds must be read from snapshot bundles.
* Policies and suppressions must resolve by hash, not name.
### 4.4 Library hooks (StellaOps.Replay.Core)
Use the shared helpers in `src/__Libraries/StellaOps.Replay.Core` to keep outputs deterministic:
- `CanonicalJson.Serialize(...)` → lexicographic key ordering with relaxed escaping, arrays preserved as-is.
- `DeterministicHash.Sha256Hex(...)` and `DeterministicHash.MerkleRootHex(...)` → lowercase digests and stable Merkle roots for bundle manifests.
- `DssePayloadBuilder.BuildUnsigned(...)` → DSSE payloads for replay manifests using payload type `application/vnd.stellaops.replay+json`.
- `ReplayManifestExtensions.ComputeCanonicalSha256()` → convenience for CAS naming of manifest blobs.
---
## 5. DSSE and Signing
### 5.1 Envelope Structure
```jsonc
{
"payloadType": "application/vnd.stellaops.replay+json",
"payload": "<base64-encoded canonical JSON>",
"signatures": [
{ "keyid": "authority-root-fips", "sig": "..." },
{ "keyid": "authority-root-gost", "sig": "..." }
]
}
```
### 5.2 Verification Steps
1. Decode payload → verify canonical form.
2. Verify each signature chain against RootPack (offline trust anchors).
3. Recompute hash and compare to `dsseEnvelopeHash` in manifest.
4. Optionally verify Rekor inclusion proof.
### 5.3 Default payload type
Replay DSSE envelopes emitted by `DssePayloadBuilder` use payload type `application/vnd.stellaops.replay+json`. Consumers should treat this as canonical unless a future manifest revision increments the schema and payload type together.
---
## 6. CLI Interface
### 6.1 Recording a Scan
```bash
stella scan image:tag --record ./out/
```
Produces:
```
out/
├─ manifest.json
├─ manifest.dsse.json
├─ inputbundle.tar.zst
├─ outputbundle.tar.zst
└─ signatures/
```
### 6.2 Verifying
```bash
stella verify manifest.json
```
* Checks all hashes and DSSE envelopes.
* Prints summary:
```
✅ Verified: SBOM, Findings, VEX, Tools, Feeds, Policy
```
### 6.3 Replaying
```bash
stella replay manifest.json --strict
stella replay manifest.json --what-if --vary=feeds
```
* `--strict`: all inputs locked; identical result expected.
* `--what-if`: varies only specified dimension(s).
### 6.4 Diffing
```bash
stella diff manifestA.json manifestB.json
```
Shows field-level differences (feed snapshot, tool, or policy hash).
---
## 7. PostgreSQL Schema
### 7.1 `replay_runs`
```jsonc
{
"_id": "uuid",
"manifestHash": "sha256:...",
"status": "verified|failed|replayed",
"createdAt": "...",
"updatedAt": "...",
"signatures": [{ "profile": "FIPS", "verified": true }],
"outputs": {
"sbom": "sha256:...",
"findings": "sha256:..."
}
}
```
### 7.2 `bundles`
```jsonc
{
"_id": "sha256:...",
"type": "input|output|rootpack",
"size": 4123123,
"location": "/var/lib/stella/bundles/<sha>.tar.zst"
}
```
### 7.3 `subjects`
```jsonc
{
"ociDigest": "sha256:abcd...",
"layers": [
{ "layerDigest": "...", "merkleRoot": "...", "leafCount": 120 }
]
}
```
---
## 8. Layer Merkle Implementation
### 8.1 Algorithm
```csharp
static string ComputeMerkleRoot(string layerTarPath)
{
const int ChunkSize = 4 * 1024 * 1024;
var hashes = new List<byte[]>();
using var fs = File.OpenRead(layerTarPath);
var buffer = new byte[ChunkSize];
int read;
using var sha = SHA256.Create();
while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
hashes.Add(sha.ComputeHash(buffer, 0, read));
while (hashes.Count > 1)
hashes = hashes
.Select((h, i) => (h, i))
.GroupBy(x => x.i / 2)
.Select(g => sha.ComputeHash(g.SelectMany(x => x.h).ToArray()))
.ToList();
return Convert.ToHexString(hashes.Single());
}
```
### 8.2 Stored Values
```json
{
"layerDigest": "sha256:...",
"merkleRoot": "b81f...",
"leafCount": 240,
"leavesHash": "sha256:..."
}
```
---
## 9. Replay Engine Implementation Notes (.NET 10)
### 9.1 Manifest Parsing
Use `System.Text.Json` with deterministic ordering:
```csharp
var options = new JsonSerializerOptions {
WriteIndented = false,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
TypeInfoResolverChain = { new OrderedResolver() }
};
```
### 9.2 Stable Output
Normalize SBOM/Findings/VEX JSON:
```csharp
string Canonicalize(string json) =>
JsonSerializer.Serialize(
JsonSerializer.Deserialize<JsonDocument>(json),
options);
```
### 9.3 Verification Flow
```csharp
var manifest = Manifest.Load("manifest.json");
VerifySignatures(manifest);
VerifyHashes(manifest);
if (mode == Strict) RunPipeline(manifest);
else RunPipelineWithVariation(manifest, vary);
```
### 9.4 Failure Modes
| Condition | Action |
| -------------------------------- | ----------------------------- |
| Missing snapshot or bundle | Error: `InputBundleMissing` |
| Feed hash mismatch | Error: `FeedSnapshotDrift` |
| Tool binary hash mismatch | Reject replay |
| Output hash drift in strict mode | Mark as failed, emit diff log |
| Invalid signature | Reject manifest |
---
## 10. Crypto Profiles and RootPack
### 10.1 Example Profiles
| Profile | Algorithms | Notes |
| -------------- | ------------------------------------- | ----------------------- |
| **FIPS-140-3** | ECDSA-P256 / SHA-256 / AES-GCM | Default for US/EU |
| **GOST** | GOST R 34.10-2012 / GOST R 34.11-2012 | Russia |
| **SM** | SM2 / SM3 / SM4 | China |
| **eIDAS** | RSA-PSS / SHA-256 | EU qualified signatures |
### 10.2 Dual-Signing Example
```bash
stella sign manifest.json --profiles=FIPS,GOST
```
Produces:
```
signatures/
├─ manifest.dsse.fips.json
└─ manifest.dsse.gost.json
```
---
## 11. Test Strategy
| Test | Description | Expected Result |
| ---------------------- | ------------------------------------ | --------------------------- |
| **Golden Replay** | Repeat identical scan → same outputs | ✅ identical hashes |
| **Feed Drift Test** | Replay with updated feeds | Only `inputs.feeds` changes |
| **Tool Upgrade Test** | Replay with new scanner version | Reject or diff by `tools` |
| **Policy Change Test** | Different lattice/mutes | Diff by `policy` section |
| **Cross-Arch Test** | x64 vs arm64 | Identical outputs |
| **Corrupted Bundle** | Tamper bundle | Verification fails |
---
## 12. Example Verification Output
```
$ stella verify manifest.json
[✓] Manifest integrity: OK
[✓] DSSE signatures (FIPS,GOST): OK
[✓] Feeds snapshot hash: OK
[✓] Policy + mutes hash: OK
[✓] Toolchain hash: OK
[✓] SBOM/VEX outputs: OK
Result: VERIFIED
```
---
## 13. Future Extensions
* Support **SPDX 3.0.1** alongside CycloneDX 1.6.
* Add **per-file Merkle proofs** for local scans.
* Ledger anchoring (Rekor, distributed Proof-Market).
* Post-quantum signatures (Dilithium/Falcon).
* Replay orchestration API (`/api/replay/:id`).
---
## 14. Summary
Deterministic Replay freezes every element of a scan:
> *image → feeds → policy → toolchain → environment → outputs → signatures.*
By enforcing canonical input/output states and verifiable cryptographic bindings, Stella Ops achieves **regulatory-grade replayability**, **regional crypto compliance**, and **immutable provenance** across all scans.
---

View File

@@ -0,0 +1,116 @@
# Stella Ops — Developer Guide: Deterministic Replay
## Purpose
Deterministic Replay ensures any past scan can be re-executed byte-for-byte, producing identical SBOM, Findings, and VEX results, cryptographically verifiable for audits or compliance.
Replay is the foundation for:
- **Audit proofs** (exact past state reproduction)
- **Diff analysis** (feeds, policies, tool versions)
- **Cross-region verification** (same outputs on different hosts)
- **Long-term cryptographic trust** (re-sign with new crypto profiles)
---
## Core Concepts
| Term | Description |
|------|--------------|
| **Replay Manifest** | Immutable JSON describing all inputs, tools, env, and outputs of a scan. |
| **InputBundle** | Snapshot of feeds, rules, policies, and toolchain binaries used. |
| **OutputBundle** | SBOM, Findings, VEX, and logs from a completed scan. |
| **Layer Merkle** | Per-layer hash tree for precise deduplication and drift detection. |
| **DSSE Envelope** | Digital signature wrapper for each attestation (SBOM, Findings, Manifest, etc.). |
---
## What to Freeze
| Category | Example Contents | Required in Manifest |
|-----------|------------------|----------------------|
| **Subject** | OCI image digest, per-layer Merkle roots | ✅ |
| **Outputs** | SBOM, Findings, VEX, logs (content hashes) | ✅ |
| **Toolchain** | Sbomer, Scanner, Vexer binaries + versions + SHA256 | ✅ |
| **Feeds/VEX sources** | Full or pruned snapshot with Merkle proofs | ✅ |
| **Policy Bundle** | Lattice rules, mutes, trust profiles, thresholds | ✅ |
| **Environment** | OS, arch, locale, TZ, deterministic seed, runtime flags | ✅ |
| **Reachability Evidence** | Callgraphs (`graphs[]`), runtime traces (`runtimeTraces[]`), analyzer/version hashes | ✅ |
| **Crypto Profile** | Algorithm suites (FIPS, GOST, SM, eIDAS) | ✅ |
---
## Replay Modes
| Mode | Purpose | Input Variation | Expected Output |
|------|----------|-----------------|-----------------|
| **Strict Replay** | Audit proof | None | Bit-for-bit identical |
| **What-If Replay** | Change impact analysis | One dimension (feeds/tools/policy) | Deterministic diff |
Example:
```
stella replay manifest.json --strict
stella replay manifest.json --what-if --vary=feeds
```
---
## Developer Responsibilities
| Module | Role |
|---------|------|
| **Scanner.WebService** | Capture full input set and produce Replay Manifest + DSSE sigs. |
| **Sbomer** | Generate deterministic SBOM; normalize ordering and JSON formatting. |
| **Vexer/Excititor** | Apply lattice and mutes from policy bundle; record gating logic. |
| **Feedser/Concelier** | Freeze and export feed snapshots or Merkle proofs. |
| **Authority** | Manage signer keys and crypto profiles; issue DSSE envelopes. |
| **CLI** | Provide `scan --record`, `replay`, `verify`, `diff` commands. |
---
## Workflow
1. `stella scan image:tag --record out/`
- Generates Replay Manifest, InputBundle, OutputBundle, DSSE sigs.
- Captures reachability graphs/traces (if enabled) and references them via `reachability.graphs[]` + `runtimeTraces[]`.
2. `stella verify manifest.json`
- Validates hashes, signatures, and completeness.
3. `stella replay manifest.json --strict`
- Re-executes in sealed mode; expect byte-identical results.
4. `stella replay manifest.json --what-if --vary=feeds`
- Runs with new feeds; diff is attributed to feeds only.
5. `stella diff manifestA manifestB`
- Attribute differences by hash comparison.
---
## Storage
- **PostgreSQL tables** (see `docs/db/SPECIFICATION.md` for schema details)
- `replay.runs`: manifest hash, status, signatures, outputs
- `replay.bundles`: digest, type, CAS location, size
- `replay.subjects`: OCI digests + per-layer Merkle roots
- **Indexes** (canonical names): `runs_manifest_hash_unique`, `runs_status_created_at`, `bundles_type`, `bundles_location`, `subjects_layer_digest`
- **File store**
- Bundles stored as `<sha256>.tar.zst` in CAS (`cas://replay/<shard>/<digest>.tar.zst`); shard = first two hex chars
---
## Developer Checklist
- [ ] All inputs (feeds, policies, tools, env) hashed and recorded.
- [ ] JSON normalization: key order, number format, newline mode.
- [ ] Random seed = `H(scan.id || MerkleRootAllLayers)`.
- [ ] Clock fixed to `scan.time` unless policy requires “now”.
- [ ] DSSE multi-sig supported (FIPS + regional).
- [ ] Manifest signed + optionally anchored to Rekor ledger.
- [ ] Replay comparison mode tested across x64/arm64.
---
## References
See also:
- `DETERMINISTIC_REPLAY.md` — detailed manifest schema & CLI examples.
- `../docs/CRYPTO_SOVEREIGN_READY.md` — RootPack and dual-signature handling.
---

View File

@@ -0,0 +1,30 @@
# Policy Simulation Gaps (PS1PS10) — Lockfile, Quotas, and Shadow Safety
This note closes POLICY-GAPS-185-006 by defining a signed inputs lock, offline verifier, and shadow isolation guardrails for policy simulations.
## Lockfile
- Schema: `docs/modules/replay/schemas/policy-sim/lock.schema.json`
- Sample: `docs/modules/replay/samples/policy-sim/inputs.lock.sample.json`
- Fields cover policy bundle, graph, SBOM, time anchor, dataset digests; shadowIsolation flag; requiredScopes.
- Recommended signing: DSSE over the lockfile with Ed25519; record envelope digest alongside artefacts.
## Validation
- Library helper: `PolicySimulationInputLockValidator` in `StellaOps.Replay.Core` compares materialized digests and enforces shadow mode + scope `policy:simulate:shadow`.
- Staleness: pass `maxAge` (suggested 24h) to reject outdated locks.
## CLI / CI contract
- Script: `scripts/replay/verify-policy-sim-lock.sh` (offline). Exit codes: 0 OK, 2 missing tools/args, 3 schema/hash mismatch, 4 stale, 5 shadow/scope failure.
- CI should run verifier before simulations and fail fast on non-zero exit.
## Quotas & backpressure
- Default limits: max 10 concurrent shadow runs per tenant; queue depth 100; reject when `policy:simulate:shadow` scope missing.
- Simulators must be read-only: no writes to policy stores; only emit shadow metrics.
## Offline policy-sim kit
- Lockfile + DSSE, digests of policy/graph/sbom/time-anchor/dataset.
- Bundle alongside replay packs; verifier uses local SHA256 only (no network).
## Shadow isolation & redaction
- Always run in `shadow` mode; block if requested runMode != `shadow`.
- Redact PII fields (`user`, `ip`, `headers`, `secrets`) before storing fixtures; keep only hashes.
- Require DSSE evidence when storing fixtures or responding to API clients.

View File

@@ -0,0 +1,57 @@
# Replay Test Strategy
> **Imposed rule:** Replay tests must use frozen inputs (SBOM, advisories, VEX, feeds, policy, tools) and fixed seeds/clocks; any non-determinism is a test failure.
This strategy defines how we validate replayability of Scanner outputs and attestations across tool/definition updates and environments.
## 1. Goals
- Prove that a recorded scan bundle (inputs + manifests) replays bit-for-bit across environments.
- Detect drift from feeds, policy, or tooling changes before shipping releases.
- Provide auditors with evidence (hashes, DSSE bundles) that replays are deterministic.
## 2. Test layers
1) **Golden replay**: take a recorded bundle (SBOM/VEX/feeds/policy/tool hashes) and rerun; assert hash equality for SBOM, findings, VEX, logs. Fail on any difference.
2) **Feed drift guard**: rerun bundle after feed update; expect differences; ensure drift is surfaced (hash mismatch, diff report) not silently masked.
3) **Tool upgrade**: rerun with new scanner version; expect stable outputs if no functional change, otherwise require documented diffs.
4) **Policy change**: rerun with updated policy; expect explain trace to show changed rules and hash delta; diff must be recorded.
5) **Offline**: replay in sealed mode using only bundle contents; no network access permitted.
## 3. Inputs
- Replay bundle contents: `sbom`, `feeds.tar.gz`, `policy.tar.gz`, `scanner-image`, `reachability.graph`, `runtime-trace` (optional), `replay.yaml`.
- Hash manifest: SHA-256 for every file; top-level Merkle root.
- DSSE attestations (optional): for replay manifest and artifacts.
## 4. Determinism settings
- Fixed clock (`--fixed-clock` ISO-8601), RNG seed (`RNG_SEED`), single-threaded mode (`SCANNER_MAX_CONCURRENCY=1`), stable ordering (sorted inputs), log filtering (strip timestamps/PIDs).
- Disable network/egress; rely on bundled feeds/policy.
## 5. Assertions
- Hash equality for outputs: SBOMs, findings, VEX, logs (canonicalised), determinism.json (if present).
- Verify DSSE signatures and Rekor proofs when available; fail if mismatched or missing.
- Report diff summary when hashes differ (feed/tool/policy drift).
## 6. Tooling
- CLI: `stella replay run --bundle <path> --fixed-clock 2025-11-01T00:00:00Z --seed 1337 --single-threaded`.
- Scripts: `scripts/replay/verify_bundle.sh` (hash/manifest check), `scripts/replay/run_replay.sh` (orchestrates fixed settings), `scripts/replay/diff_outputs.py` (canonical diffs).
- CI: `bench:determinism` target executes golden replay on reference bundles; fails on hash delta.
## 7. Outputs
- `replay-results.json` with per-artifact hashes, pass/fail, diff counts.
- `replay.log` filtered (no timestamps/PIDs), `replay.hashes` (sha256sum of outputs).
- Optional DSSE attestation for replay results.
## 8. Reporting
- Publish results to CI artifacts; store in Evidence Locker for audit.
- Add summary to release notes when replay is part of a release gate.
## 9. Checklists
- [ ] Bundle verified (hash manifest, DSSE if present).
- [ ] Fixed clock/seed/concurrency applied.
- [ ] Network disabled; feeds/policy/tooling from bundle only.
- [ ] Outputs hashed and compared to baseline; diffs recorded.
- [ ] Replay results stored + (optionally) attested.
## References
- `docs/modules/scanner/determinism-score.md`
- `docs/modules/replay/guides/DETERMINISTIC_REPLAY.md`
- `docs/modules/scanner/entropy.md`

View File

@@ -0,0 +1,455 @@
# Replay Manifest Guide
> **Sprint:** SPRINT_20251228_001_BE_replay_manifest_ci (T6)
> **Purpose:** Complete reference for Replay Manifest export, verification, and CI integration.
## Overview
The Replay Manifest is a self-contained JSON document that captures everything needed to reproduce a scan: inputs, toolchain versions, policies, and expected outputs. When verified, it provides cryptographic proof that a scan is deterministic and reproducible.
## Quick Start
```bash
# Export replay manifest after scanning
stella replay export --scan-id <scan-uuid> --output replay.json
# Or export for a specific image
stella replay export --image myregistry/app:v1.0.0 --output replay.json
# Verify determinism (strict mode)
stella replay verify --manifest replay.json --strict-mode
# Verify with drift failure (for CI)
stella replay verify --manifest replay.json --fail-on-drift
```
---
## Schema Reference
### Schema Version
Current version: `1.0.0`
Schema location: `src/__Libraries/StellaOps.Replay.Core/Schemas/replay-export.schema.json`
### Top-Level Structure
```json
{
"version": "1.0.0",
"snapshot": { ... },
"toolchain": { ... },
"inputs": { ... },
"outputs": { ... },
"verification": { ... }
}
```
### `snapshot` Object
Identifies the scan snapshot this manifest represents.
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique snapshot ID (`snapshot:<sha256>`) |
| `createdAt` | ISO 8601 | UTC timestamp when scan completed |
| `artifact` | object | Reference to scanned artifact (digest, repository, tag) |
Example:
```json
{
"id": "snapshot:a1b2c3d4e5f6...",
"createdAt": "2025-12-28T14:30:00Z",
"artifact": {
"digest": "sha256:abc123...",
"repository": "myregistry/app",
"tag": "v1.0.0"
}
}
```
### `toolchain` Object
Captures exact versions of all tools used during the scan.
| Field | Type | Description |
|-------|------|-------------|
| `scannerVersion` | string | StellaOps Scanner version |
| `policyEngineVersion` | string | Policy Engine version |
| `platform` | string | Platform identifier (e.g., `linux-x64`) |
| `sbomerVersion` | string | SBOM generator version |
| `vexerVersion` | string | VEX processor version |
Example:
```json
{
"scannerVersion": "0.42.0",
"policyEngineVersion": "0.42.0",
"platform": "linux-x64",
"sbomerVersion": "0.42.0",
"vexerVersion": "0.42.0"
}
```
### `inputs` Object
All inputs consumed during the scan, with content hashes.
| Field | Type | Description |
|-------|------|-------------|
| `sboms` | array | SBOM inputs (if layered) |
| `vex` | array | VEX documents used |
| `feeds` | array | Vulnerability feed snapshots |
| `policies` | object | Policy bundle reference |
Feed snapshot example:
```json
{
"feeds": [
{
"name": "nvd",
"snapshotId": "nvd:2025-12-28T00:00:00Z",
"digest": "sha256:def456...",
"recordCount": 245678
}
]
}
```
### `outputs` Object
Expected outputs from the scan, used for verification.
| Field | Type | Description |
|-------|------|-------------|
| `verdictDigest` | string | SHA256 of verdict JSON |
| `decision` | enum | `allow`, `deny`, or `review` |
| `sbomDigest` | string | SHA256 of generated SBOM |
| `findingsDigest` | string | SHA256 of findings JSON |
### `verification` Object
Helper commands and expected hashes for verification.
| Field | Type | Description |
|-------|------|-------------|
| `command` | string | CLI command to reproduce scan |
| `expectedSbomHash` | string | Expected SBOM content hash |
| `expectedVerdictHash` | string | Expected verdict content hash |
---
## CLI Commands
### `stella replay export`
Export a replay manifest from a completed scan.
```bash
stella replay export [OPTIONS]
```
| Option | Required | Description |
|--------|----------|-------------|
| `--scan-id <uuid>` | One of | Scan ID to export |
| `--image <ref>` | One of | Image reference (uses latest scan) |
| `--output <path>` | No | Output path (default: `replay.json`) |
| `--include-feed-snapshots` | No | Include full feed snapshot refs |
| `--no-verification-script` | No | Skip verification command generation |
### `stella replay verify`
Verify a replay manifest by re-executing the scan and comparing outputs.
```bash
stella replay verify [OPTIONS]
```
| Option | Required | Description |
|--------|----------|-------------|
| `--manifest <path>` | Yes | Path to replay manifest |
| `--strict-mode` | No | Require bit-for-bit identical outputs |
| `--fail-on-drift` | No | Exit code 1 on any drift |
| `--output-diff <path>` | No | Write diff report to file |
### Exit Codes
| Code | Meaning |
|------|---------|
| `0` | Verification passed, outputs match |
| `1` | Drift detected, outputs differ |
| `2` | Verification error (missing inputs, invalid manifest, etc.) |
---
## CI Integration
### Gitea Actions
```yaml
name: SBOM Replay Verification
on:
push:
branches: [main]
pull_request:
jobs:
verify-determinism:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t ${{ github.repository }}:${{ github.sha }} .
- name: Scan with replay export
run: |
stellaops scan \
--image ${{ github.repository }}:${{ github.sha }} \
--output-sbom sbom.json \
--output-replay replay.json
- name: Verify determinism
run: |
stellaops replay verify \
--manifest replay.json \
--fail-on-drift \
--strict-mode
- name: Upload replay manifest
uses: actions/upload-artifact@v4
with:
name: replay-manifest
path: replay.json
retention-days: 90
```
### GitHub Actions
```yaml
name: SBOM Replay Verification
on:
push:
branches: [main]
pull_request:
jobs:
verify-determinism:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up StellaOps
uses: stellaops/setup-stella@v1
with:
version: '0.42.0'
- name: Build and scan
run: |
docker build -t myapp:${{ github.sha }} .
stella scan --image myapp:${{ github.sha }} \
--output-sbom sbom.json \
--output-replay replay.json
- name: Verify replay
run: stella replay verify --manifest replay.json --fail-on-drift
- name: Upload attestations
uses: actions/upload-artifact@v4
with:
name: sbom-attestations
path: |
sbom.json
replay.json
```
### GitLab CI
```yaml
sbom-replay:
stage: security
image: stellaops/cli:latest
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- stella scan --image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA --output-replay replay.json
- stella replay verify --manifest replay.json --fail-on-drift
artifacts:
paths:
- replay.json
expire_in: 90 days
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
```
---
## Troubleshooting Drift Detection
### Common Drift Causes
| Cause | Symptom | Fix |
|-------|---------|-----|
| Feed update | `findingsDigest` differs | Pin feed snapshot version |
| Policy change | `verdictDigest` differs | Version policy bundles |
| Tool upgrade | All digests differ | Lock toolchain versions |
| Non-deterministic SBOM | `sbomDigest` differs | Enable deterministic mode |
| Timezone issues | Timestamps drift | Ensure UTC everywhere |
### Debugging Steps
1. **Export diff report:**
```bash
stella replay verify --manifest replay.json --output-diff drift-report.json
```
2. **Compare inputs:**
```bash
stella replay diff --manifest-a old.json --manifest-b new.json --show-inputs
```
3. **Check feed versions:**
```bash
stella feeds list --show-snapshots
```
4. **Verify toolchain:**
```bash
stella version --all
```
### Feed Snapshot Pinning
For reproducible CI, pin feed snapshots:
```bash
# List available snapshots
stella feeds snapshots --feed nvd
# Pin specific snapshot
stella scan --image myapp:v1.0.0 \
--feed-snapshot nvd:2025-12-28T00:00:00Z \
--output-replay replay.json
```
---
## Best Practices for Deterministic Builds
### 1. Lock All Dependencies
```yaml
# In CI, always specify exact versions
stellaops/cli:0.42.0 # Not :latest
```
### 2. Pin Feed Snapshots
```bash
# Export current snapshot ID
stella feeds export-snapshot --output feeds-snapshot.json
# Use in subsequent scans
stella scan --feed-snapshot-file feeds-snapshot.json
```
### 3. Version Policy Bundles
```bash
# Store policies in version control
git add policies/
git commit -m "Policy bundle v2.3.0"
# Reference by commit in manifest
stella scan --policy-ref policies@abc123
```
### 4. Use Strict Mode in CI
```bash
# Always use strict mode in CI pipelines
stella replay verify --manifest replay.json --strict-mode --fail-on-drift
```
### 5. Archive Replay Manifests
Store replay manifests alongside release artifacts for audit trail:
```bash
# Archive with release
cp replay.json releases/v1.0.0/replay.json
```
---
## API Reference
### `IReplayManifestExporter`
```csharp
public interface IReplayManifestExporter
{
/// <summary>
/// Exports a replay manifest for a completed scan.
/// </summary>
Task<ReplayExportResult> ExportAsync(
string scanId,
ReplayExportOptions options,
CancellationToken ct = default);
}
```
### `ReplayExportOptions`
```csharp
public sealed record ReplayExportOptions
{
/// <summary>Include exact toolchain versions.</summary>
public bool IncludeToolchainVersions { get; init; } = true;
/// <summary>Include feed snapshot references.</summary>
public bool IncludeFeedSnapshots { get; init; } = true;
/// <summary>Generate verification shell command.</summary>
public bool GenerateVerificationScript { get; init; } = true;
/// <summary>Output file path.</summary>
public string OutputPath { get; init; } = "replay.json";
}
```
### `ReplayExportResult`
```csharp
public sealed record ReplayExportResult
{
/// <summary>Path to exported manifest.</summary>
public required string ManifestPath { get; init; }
/// <summary>SHA256 digest of manifest content.</summary>
public required string ManifestDigest { get; init; }
/// <summary>Path to verification script (if generated).</summary>
public string? VerificationScriptPath { get; init; }
}
```
---
## Related Documentation
- [Deterministic Replay](DETERMINISTIC_REPLAY.md) - Core concepts and architecture
- [Developer Guide: Replay](DEVS_GUIDE_REPLAY.md) - Implementation details
- [Replay Manifest v2 Acceptance](replay-manifest-v2-acceptance.md) - Schema evolution
- [Test Strategy](TEST_STRATEGY.md) - Replay testing approach
---
## Changelog
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2025-12-28 | Initial schema and CLI commands |

View File

@@ -0,0 +1,311 @@
# Replay Manifest v2 Acceptance Contract
_Last updated: 2025-12-13. Owner: BE-Base Platform Guild._
This document defines the acceptance criteria and test vectors for replay manifest v2, enabling Task 19 (GAP-REP-004) to proceed with implementation.
---
## 1. Overview
Replay manifest v2 introduces:
- **BLAKE3 graph hashes:** Primary hash algorithm for reachability graphs
- **Sorted CAS entries:** Deterministic ordering of all CAS references
- **hashAlg fields:** Explicit algorithm declarations for forward compatibility
- **code_id coverage:** Coverage metrics for stripped binary handling
---
## 2. Schema Changes (v1 → v2)
### 2.1 Version Field
```json
{
"schemaVersion": "2.0",
...
}
```
### 2.2 Hash Algorithm Declaration
All hash fields now include explicit algorithm:
```json
{
"reachability": {
"graphs": [
{
"hash": "blake3:a1b2c3d4e5f6...",
"hashAlg": "blake3-256",
"casUri": "cas://reachability/graphs/blake3:a1b2c3d4..."
}
],
"runtimeTraces": [
{
"hash": "sha256:feedface...",
"hashAlg": "sha256",
"casUri": "cas://reachability/runtime/sha256:feedface..."
}
]
}
}
```
### 2.3 Sorted CAS Entries
All arrays must be sorted by deterministic key:
| Array | Sort Key |
|-------|----------|
| `reachability.graphs[]` | `casUri` (lexicographic) |
| `reachability.runtimeTraces[]` | `casUri` (lexicographic) |
| `inputs.feeds[]` | `name` (lexicographic) |
| `inputs.tools[]` | `name` (lexicographic) |
### 2.4 Code ID Coverage
New field for stripped binary support:
```json
{
"reachability": {
"code_id_coverage": {
"total_nodes": 1247,
"nodes_with_symbol_id": 1189,
"nodes_with_code_id": 58,
"coverage_percent": 100.0
}
}
}
```
---
## 3. CAS Registration Gates
### 3.1 Required Registration
All referenced artifacts must be registered in CAS before manifest finalization:
| Artifact Type | CAS Path Pattern | Required |
|---------------|------------------|----------|
| Graph body | `cas://reachability/graphs/{hash}` | Yes |
| Graph DSSE | `cas://reachability/graphs/{hash}.dsse` | Yes |
| Runtime trace | `cas://reachability/runtime/{hash}` | Conditional |
| Edge bundle | `cas://reachability/edges/{graph_hash}/{bundle_id}` | Conditional |
### 3.2 Registration Validation
Before signing a replay manifest:
1. Verify all `casUri` references resolve to existing CAS objects
2. Verify hash matches CAS content
3. Verify DSSE envelope exists for all graph references
4. Fail manifest creation if any reference is missing
### 3.3 Validation API
```csharp
public interface ICasValidator
{
Task<CasValidationResult> ValidateAsync(string casUri, string expectedHash);
Task<CasValidationResult> ValidateBatchAsync(IEnumerable<CasReference> refs);
}
public record CasValidationResult(
bool IsValid,
string? ActualHash,
string? Error
);
```
---
## 4. Acceptance Test Vectors
### 4.1 Minimal Valid Manifest v2
```json
{
"schemaVersion": "2.0",
"scan": {
"id": "scan-test-001",
"time": "2025-12-13T10:00:00Z",
"mode": "record",
"scannerVersion": "10.2.0"
},
"subject": {
"ociDigest": "sha256:abc123..."
},
"inputs": {
"feeds": [],
"tools": []
},
"reachability": {
"graphs": [
{
"kind": "static",
"analyzer": "scanner.java@10.2.0",
"hash": "blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2",
"hashAlg": "blake3-256",
"casUri": "cas://reachability/graphs/blake3:a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2"
}
],
"runtimeTraces": [],
"code_id_coverage": {
"total_nodes": 100,
"nodes_with_symbol_id": 100,
"nodes_with_code_id": 0,
"coverage_percent": 100.0
}
},
"outputs": {},
"provenance": {}
}
```
**Expected canonical hash:** `sha256:e7f8a9b0...` (computed from canonical JSON)
### 4.2 Manifest with Runtime Traces
```json
{
"schemaVersion": "2.0",
"scan": {
"id": "scan-test-002",
"time": "2025-12-13T11:00:00Z",
"mode": "record",
"scannerVersion": "10.2.0"
},
"reachability": {
"graphs": [
{
"kind": "static",
"analyzer": "scanner.java@10.2.0",
"hash": "blake3:1111111111111111111111111111111111111111111111111111111111111111",
"hashAlg": "blake3-256",
"casUri": "cas://reachability/graphs/blake3:1111111111111111111111111111111111111111111111111111111111111111"
}
],
"runtimeTraces": [
{
"source": "eventpipe",
"hash": "sha256:2222222222222222222222222222222222222222222222222222222222222222",
"hashAlg": "sha256",
"casUri": "cas://reachability/runtime/sha256:2222222222222222222222222222222222222222222222222222222222222222",
"recordedAt": "2025-12-13T10:30:00Z"
}
]
}
}
```
### 4.3 Sorting Validation Vector
Input (unsorted):
```json
{
"reachability": {
"graphs": [
{"casUri": "cas://reachability/graphs/blake3:zzzz...", "kind": "framework"},
{"casUri": "cas://reachability/graphs/blake3:aaaa...", "kind": "static"}
]
}
}
```
Expected output (sorted):
```json
{
"reachability": {
"graphs": [
{"casUri": "cas://reachability/graphs/blake3:aaaa...", "kind": "static"},
{"casUri": "cas://reachability/graphs/blake3:zzzz...", "kind": "framework"}
]
}
}
```
### 4.4 Invalid Manifest Vectors
| Test Case | Input | Expected Error |
|-----------|-------|----------------|
| Missing schemaVersion | `{}` | `REPLAY_MANIFEST_MISSING_VERSION` |
| Invalid version | `{"schemaVersion": "1.0"}` | `REPLAY_MANIFEST_VERSION_MISMATCH` (when v2 required) |
| Missing hashAlg | `{"hash": "blake3:..."}` | `REPLAY_MANIFEST_MISSING_HASH_ALG` |
| Unsorted graphs | See 4.3 input | `REPLAY_MANIFEST_UNSORTED_ENTRIES` |
| Missing CAS reference | `{"casUri": "cas://missing/..."}` | `REPLAY_MANIFEST_CAS_NOT_FOUND` |
| Hash mismatch | CAS content differs | `REPLAY_MANIFEST_HASH_MISMATCH` |
---
## 5. Migration Path
### 5.1 v1 → v2 Upgrade
```csharp
public static ReplayManifest UpgradeToV2(ReplayManifest v1)
{
return v1 with
{
SchemaVersion = "2.0",
Reachability = v1.Reachability with
{
Graphs = v1.Reachability.Graphs
.Select(g => g with { HashAlg = InferHashAlg(g.Hash) })
.OrderBy(g => g.CasUri)
.ToList(),
RuntimeTraces = v1.Reachability.RuntimeTraces
.Select(t => t with { HashAlg = InferHashAlg(t.Hash) })
.OrderBy(t => t.CasUri)
.ToList()
}
};
}
```
### 5.2 Backward Compatibility
- v2 readers MUST accept v1 manifests with warning
- v2 writers MUST always emit v2 format
- v1 writers deprecated after 2026-03-01
---
## 6. Test Fixture Locations
```
tests/Replay/
fixtures/
manifest-v2-minimal.json
manifest-v2-with-runtime.json
manifest-v2-sorted.json
manifest-v2-code-id-coverage.json
invalid/
manifest-missing-version.json
manifest-unsorted.json
manifest-missing-hashalg.json
golden/
manifest-v2-canonical.golden.json
manifest-v2-hash.golden.txt
```
---
## 7. Implementation Checklist
- [ ] Update `ReplayManifest` record with v2 fields
- [ ] Add `hashAlg` to all hash-bearing types
- [ ] Implement sorting in `ReachabilityReplayWriter`
- [ ] Add CAS registration validation
- [ ] Create test fixtures
- [ ] Update `DETERMINISTIC_REPLAY.md` section 3
- [ ] Wire into RecordModeService
---
_Last updated: 2025-12-13. See Sprint 0401 GAP-REP-004 for implementation._

View File

@@ -0,0 +1,27 @@
# Replay Retention Schema Freeze - 2025-12-10
## Why
- Unblock EvidenceLocker replay ingestion tasks (EVID-REPLAY-187-001) and downstream CLI/runbook work by freezing a retention declaration schema.
- Keep outputs deterministic and tenant-scoped while offline/air-gap friendly.
## Scope & Decisions
- Schema path: `docs/schemas/replay-retention.schema.json`.
- Fields:
- `retention_policy_id` (string, stable ID for policy version).
- `tenant_id` (string, required).
- `dataset` (string; e.g., evidence_bundle, replay_log, advisory_payload).
- `bundle_type` (enum: portable_bundle, sealed_bundle, replay_log, advisory_payload).
- `retention_days` (int 1-3650).
- `legal_hold` (bool).
- `purge_after` (ISO-8601 UTC; derived from ingest + retention_days unless legal_hold=true).
- `checksum` (algorithm: sha256/sha512, value hex).
- `created_at` (ISO-8601 UTC).
- Determinism: no additionalProperties; checksum recorded for audit; UTC timestamps only.
- Tenant isolation: tenant_id mandatory; policy IDs may be per-tenant.
## Impacted Tasks
- EVID-REPLAY-187-001, CLI-REPLAY-187-002, RUNBOOK-REPLAY-187-004 are unblocked on retention shape; implementation still required in corresponding modules.
## Next Steps
- Wire schema validation in EvidenceLocker ingest and CLI replay commands.
- Document retention defaults and legal-hold overrides in `docs/operations/runbooks/replay_ops.md`.

View File

@@ -0,0 +1,44 @@
# Replay PostgreSQL Schema
Status: draft · applies to net10 replay pipeline (Sprint 0185)
## Tables
### replay_runs
- **id**: scan UUID (string, primary key)
- **manifest_hash**: `sha256:<hex>` (unique)
- **status**: `pending|verified|failed|replayed`
- **created_at / updated_at**: UTC ISO-8601
- **signatures**: JSONB `[{ profile, verified }]` (multi-profile DSSE verification)
- **outputs**: JSONB `{ sbom, findings, vex?, log? }` (all SHA-256 digests)
**Indexes**
- `runs_manifest_hash_unique`: `(manifest_hash)` (unique)
- `runs_status_created_at`: `(status, created_at DESC)`
### replay_bundles
- **id**: bundle digest hex (no `sha256:` prefix)
- **type**: `input|output|rootpack|reachability`
- **size**: bytes
- **location**: CAS URI `cas://replay/<prefix>/<digest>.tar.zst`
- **created_at**: UTC ISO-8601
**Indexes**
- `bundles_type`: `(type, created_at DESC)`
- `bundles_location`: `(location)`
### replay_subjects
- **id**: OCI image digest (`sha256:<hex>`)
- **layers**: JSONB `[{ layer_digest, merkle_root, leaf_count }]`
**Indexes**
- `subjects_layer_digest`: GIN index on `layers` for layer_digest lookups
## Determinism & constraints
- All timestamps stored as UTC.
- Digests are lowercase hex; CAS URIs must follow `cas://<prefix>/<shard>/<digest>.tar.zst` where `<shard>` = first two hex chars.
- No external references; embed minimal metadata only (feed/policy hashes live in replay manifest).
## Client models
- Implemented in `src/__Libraries/StellaOps.Replay.Core/ReplayPostgresModels.cs` with matching index name constants (`ReplayIndexes`).
- Serialization uses System.Text.Json with snake_case property naming; field names match table schema above.