Files
git.stella-ops.org/docs/modules/findings-ledger/replay-harness.md
master 61f963fd52
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement ledger metrics for observability and add tests for Ruby packages endpoints
- Added `LedgerMetrics` class to record write latency and total events for ledger operations.
- Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling.
- Introduced `TestSurfaceSecretsScope` for managing environment variables during tests.
- Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents.
- Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB.
- Established MongoDB indexes for efficient querying of events based on provenance and trust.
- Added models and JSON parsing logic for DSSE provenance and trust information.
2025-11-13 09:29:09 +02:00

87 lines
3.9 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Findings Ledger Replay & Determinism Harness (LEDGER-29-008)
> **Audience:** Findings Ledger Guild · QA Guild · Policy Guild
> **Purpose:** Define the reproducible harness for 5M findings/tenant replay tests and determinism validation required by LEDGER-29-008.
## 1. Goals
- Reproduce ledger + projection state from canonical event fixtures with byte-for-byte determinism.
- Stress test writer/projector throughput at ≥5M findings per tenant, capturing CPU/memory/latency profiles.
- Produce signed reports (DSSE) that CI and auditors can review before shipping.
## 2. Architecture
```
Fixtures (.ndjson) → Harness Runner → Ledger Writer API → Postgres Ledger DB
↘ Projector (same DB) ↘ Metrics snapshot
```
- **Fixtures:** `fixtures/ledger/*.ndjson`, sorted by `sequence_no`, containing canonical JSON envelopes with precomputed hashes.
- **Runner:** `tools/LedgerReplayHarness` (console app) feeds events, waits for projector catch-up, and verifies projection hashes.
- **Validation:** After replay, the runner re-reads ledger/projection tables, recomputes hashes, and compares to fixture expectations.
- **Reporting:** Generates `harness-report.json` with metrics (latency histogram, insertion throughput, projection lag) plus a DSSE signature.
## 3. CLI usage
```bash
dotnet run --project tools/LedgerReplayHarness \
-- --fixture fixtures/ledger/tenant-a.ndjson \
--connection "Host=postgres;Username=stellaops;Password=***;Database=findings_ledger" \
--tenant tenant-a \
--maxParallel 8 \
--report out/harness/tenant-a-report.json
```
Options:
| Option | Description |
| --- | --- |
| `--fixture` | Path to NDJSON file (supports multiple). |
| `--connection` | Postgres connection string (writer + projector share). |
| `--tenant` | Tenant identifier; harness ensures partitions exist. |
| `--maxParallel` | Batch concurrency (default 4). |
| `--report` | Output path for report JSON; `.sig` generated alongside. |
| `--metrics-endpoint` | Optional Prometheus scrape URI for live metrics snapshot. |
## 4. Verification steps
1. **Hash validation:** Recompute `event_hash` for each appended event and ensure matches fixture.
2. **Sequence integrity:** Confirm gapless sequences per chain; harness aborts on mismatch.
3. **Projection determinism:** Compare projector-derived `cycle_hash` with expected value from fixture metadata.
4. **Performance:** Capture P50/P95 latencies for `ledger_write_latency_seconds` and ensure targets (<120ms P95) met.
5. **Resource usage:** Sample CPU/memory via `dotnet-counters` or `kubectl top` and store in report.
6. **Merkle root check:** Rebuild Merkle tree from events and ensure root equals database `ledger_merkle_roots` entry.
## 5. Output report schema
```json
{
"tenant": "tenant-a",
"fixtures": ["fixtures/ledger/tenant-a.ndjson"],
"eventsWritten": 5123456,
"durationSeconds": 1422.4,
"latencyP95Ms": 108.3,
"projectionLagMaxSeconds": 18.2,
"cpuPercentMax": 72.5,
"memoryMbMax": 3580,
"merkleRoot": "3f1a…",
"status": "pass",
"timestamp": "2025-11-13T11:45:00Z"
}
```
The harness writes `harness-report.json` plus `harness-report.json.sig` (DSSE) and `metrics-snapshot.prom` for archival.
## 6. CI integration
- New pipeline job `ledger-replay-harness` runs nightly with reduced dataset (1M findings) to detect regressions quickly.
- Full 5M run executes weekly and before releases; artifacts uploaded to `out/qa/findings-ledger/`.
- Gates: merge blocked if harness `status != pass` or latencies exceed thresholds.
## 7. Air-gapped execution
- Include fixtures + harness binaries inside Offline Kit under `offline/ledger/replay/`.
- Provide `run-harness.sh` script that sets env vars, executes runner, and exports reports.
- Operators attach signed reports to audit trails, verifying hashed fixtures before import.
---
*Draft prepared 2025-11-13 for LEDGER-29-008. Update when CLI options or thresholds change.*