Files
git.stella-ops.org/docs/modules/findings-ledger/replay-harness.md
master 61f963fd52
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement ledger metrics for observability and add tests for Ruby packages endpoints
- Added `LedgerMetrics` class to record write latency and total events for ledger operations.
- Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling.
- Introduced `TestSurfaceSecretsScope` for managing environment variables during tests.
- Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents.
- Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB.
- Established MongoDB indexes for efficient querying of events based on provenance and trust.
- Added models and JSON parsing logic for DSSE provenance and trust information.
2025-11-13 09:29:09 +02:00

3.9 KiB
Raw Blame History

Findings Ledger Replay & Determinism Harness (LEDGER-29-008)

Audience: Findings Ledger Guild · QA Guild · Policy Guild
Purpose: Define the reproducible harness for 5M findings/tenant replay tests and determinism validation required by LEDGER-29-008.

1. Goals

  • Reproduce ledger + projection state from canonical event fixtures with byte-for-byte determinism.
  • Stress test writer/projector throughput at ≥5M findings per tenant, capturing CPU/memory/latency profiles.
  • Produce signed reports (DSSE) that CI and auditors can review before shipping.

2. Architecture

Fixtures (.ndjson) → Harness Runner → Ledger Writer API → Postgres Ledger DB
                                     ↘ Projector (same DB) ↘ Metrics snapshot
  • Fixtures: fixtures/ledger/*.ndjson, sorted by sequence_no, containing canonical JSON envelopes with precomputed hashes.
  • Runner: tools/LedgerReplayHarness (console app) feeds events, waits for projector catch-up, and verifies projection hashes.
  • Validation: After replay, the runner re-reads ledger/projection tables, recomputes hashes, and compares to fixture expectations.
  • Reporting: Generates harness-report.json with metrics (latency histogram, insertion throughput, projection lag) plus a DSSE signature.

3. CLI usage

dotnet run --project tools/LedgerReplayHarness \
  -- --fixture fixtures/ledger/tenant-a.ndjson \
     --connection "Host=postgres;Username=stellaops;Password=***;Database=findings_ledger" \
     --tenant tenant-a \
     --maxParallel 8 \
     --report out/harness/tenant-a-report.json

Options:

Option Description
--fixture Path to NDJSON file (supports multiple).
--connection Postgres connection string (writer + projector share).
--tenant Tenant identifier; harness ensures partitions exist.
--maxParallel Batch concurrency (default 4).
--report Output path for report JSON; .sig generated alongside.
--metrics-endpoint Optional Prometheus scrape URI for live metrics snapshot.

4. Verification steps

  1. Hash validation: Recompute event_hash for each appended event and ensure matches fixture.
  2. Sequence integrity: Confirm gapless sequences per chain; harness aborts on mismatch.
  3. Projection determinism: Compare projector-derived cycle_hash with expected value from fixture metadata.
  4. Performance: Capture P50/P95 latencies for ledger_write_latency_seconds and ensure targets (<120ms P95) met.
  5. Resource usage: Sample CPU/memory via dotnet-counters or kubectl top and store in report.
  6. Merkle root check: Rebuild Merkle tree from events and ensure root equals database ledger_merkle_roots entry.

5. Output report schema

{
  "tenant": "tenant-a",
  "fixtures": ["fixtures/ledger/tenant-a.ndjson"],
  "eventsWritten": 5123456,
  "durationSeconds": 1422.4,
  "latencyP95Ms": 108.3,
  "projectionLagMaxSeconds": 18.2,
  "cpuPercentMax": 72.5,
  "memoryMbMax": 3580,
  "merkleRoot": "3f1a…",
  "status": "pass",
  "timestamp": "2025-11-13T11:45:00Z"
}

The harness writes harness-report.json plus harness-report.json.sig (DSSE) and metrics-snapshot.prom for archival.

6. CI integration

  • New pipeline job ledger-replay-harness runs nightly with reduced dataset (1M findings) to detect regressions quickly.
  • Full 5M run executes weekly and before releases; artifacts uploaded to out/qa/findings-ledger/.
  • Gates: merge blocked if harness status != pass or latencies exceed thresholds.

7. Air-gapped execution

  • Include fixtures + harness binaries inside Offline Kit under offline/ledger/replay/.
  • Provide run-harness.sh script that sets env vars, executes runner, and exports reports.
  • Operators attach signed reports to audit trails, verifying hashed fixtures before import.

Draft prepared 2025-11-13 for LEDGER-29-008. Update when CLI options or thresholds change.