Implement ledger metrics for observability and add tests for Ruby packages endpoints
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added `LedgerMetrics` class to record write latency and total events for ledger operations. - Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling. - Introduced `TestSurfaceSecretsScope` for managing environment variables during tests. - Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents. - Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB. - Established MongoDB indexes for efficient querying of events based on provenance and trust. - Added models and JSON parsing logic for DSSE provenance and trust information.
This commit is contained in:
61
docs/modules/findings-ledger/airgap-provenance.md
Normal file
61
docs/modules/findings-ledger/airgap-provenance.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Findings Ledger — Air-Gap Provenance Extensions (LEDGER-AIRGAP-56/57/58)
|
||||
|
||||
> **Scope:** How ledger events capture mirror bundle provenance, staleness metrics, evidence snapshots, and sealed-mode timeline events for air-gapped deployments.
|
||||
|
||||
## 1. Requirements recap
|
||||
- **LEDGER-AIRGAP-56-001:** Record mirror bundle metadata (`bundle_id`, `merkle_root`, `time_anchor`, `source_region`) whenever advisories/VEX/policies are imported offline. Tie import provenance to each affected ledger event.
|
||||
- **LEDGER-AIRGAP-56-002:** Surface staleness metrics and enforce risk-critical export blocks when imported data exceeds freshness SLAs; emit remediation guidance.
|
||||
- **LEDGER-AIRGAP-57-001:** Link findings evidence snapshots (portable bundles) so cross-enclave verification can attest to the same ledger hash.
|
||||
- **LEDGER-AIRGAP-58-001:** Emit sealed-mode timeline events describing bundle impacts (new findings, remediation deltas) for Console and Notify.
|
||||
|
||||
## 2. Schema additions
|
||||
|
||||
| Entity | Field | Type | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
| `ledger_events.event_body` | `airgap.bundle` | object | `{ "bundleId", "merkleRoot", "timeAnchor", "sourceRegion", "importedAt", "importOperator" }` recorded on import events. |
|
||||
| `ledger_events.event_body` | `airgap.evidenceSnapshot` | object | `{ "bundleUri", "dsseDigest", "expiresAt" }` for findings evidence bundles. |
|
||||
| `ledger_projection` | `airgap.stalenessSeconds` | integer | Age of newest data feeding the finding projection. |
|
||||
| `ledger_projection` | `airgap.bundleId` | string | Last bundle influencing the projection row. |
|
||||
| `timeline_events` (new view) | `airgapImpact` | object | Materials needed for LEDGER-AIRGAP-58-001 timeline feed (finding counts, severity deltas). |
|
||||
|
||||
Canonical JSON must sort object keys (`bundleId`, `importOperator`, …) to keep hashes deterministic.
|
||||
|
||||
## 3. Import workflow
|
||||
1. **Mirror bundle validation:** AirGap controller verifies bundle signature/manifest before ingest; saves metadata for ledger enrichment.
|
||||
2. **Event enrichment:** The importer populates `airgap.bundle` fields on each event produced from the bundle. `bundleId` equals manifest digest (SHA-256). `merkleRoot` is the bundle’s manifest Merkle root; `timeAnchor` is the authoritative timestamp from the bundle.
|
||||
3. **Anchoring:** Merkle batching includes bundle metadata; anchor references in `ledger_merkle_roots.anchor_reference` use format `airgap::<bundleId>` when not externally anchored.
|
||||
4. **Projection staleness:** Projector updates `airgap.stalenessSeconds` comparing current time with `bundle.timeAnchor` per artifact scope; CLI + Console read the value to display freshness indicators.
|
||||
|
||||
## 4. Staleness enforcement
|
||||
- Config option `AirGapPolicies:FreshnessThresholdSeconds` (default 604800 = 7 days) sets allowable age.
|
||||
- Export workflows check `airgap.stalenessSeconds`; when over threshold the service raises `ERR_AIRGAP_STALE` and supplies remediation message referencing the last bundle (`bundleId`, `timeAnchor`, `importOperator`).
|
||||
- Metrics (`ledger_airgap_staleness_seconds`) track distribution per tenant for dashboards.
|
||||
|
||||
## 5. Evidence snapshots
|
||||
- Evidence bundles (`airgap.evidenceSnapshot`) reference portable DSSE packages stored in Evidence Locker (`bundleUri` like `file://offline/evidence/<bundleId>.tar`).
|
||||
- CLI command `stella ledger evidence link` attaches evidence snapshots to findings after bundle generation; ledger event records both DSSE digest and expiration.
|
||||
- Timeline entries and Console detail views display “Evidence snapshot available” with download instructions suited for sealed environments.
|
||||
|
||||
## 6. Timeline events (LEDGER-AIRGAP-58-001)
|
||||
- New derived view `timeline_airgap_impacts` emits JSON objects such as:
|
||||
```json
|
||||
{
|
||||
"tenant": "tenant-a",
|
||||
"bundleId": "bundle-sha256:…",
|
||||
"newFindings": 42,
|
||||
"resolvedFindings": 18,
|
||||
"criticalDelta": +5,
|
||||
"timeAnchor": "2025-10-30T11:00:00Z",
|
||||
"sealedMode": true
|
||||
}
|
||||
```
|
||||
- Console + Notify subscribe to `ledger.airgap.timeline` events to show sealed-mode summaries.
|
||||
|
||||
## 7. Offline kit considerations
|
||||
- Include bundle provenance schema, staleness policy config, CLI scripts (`stella airgap bundle import`, `stella ledger evidence link`), and sample manifests.
|
||||
- Provide validation script `scripts/ledger/validate-airgap-bundle.sh` verifying manifest signatures, timestamps, and ledger enrichment before ingest.
|
||||
- Document sealed-mode toggles ensuring no external egress occurs when importing bundles.
|
||||
|
||||
---
|
||||
|
||||
*Draft 2025-11-13 for LEDGER-AIRGAP-56/57/58 planning.*
|
||||
129
docs/modules/findings-ledger/deployment.md
Normal file
129
docs/modules/findings-ledger/deployment.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Findings Ledger Deployment & Operations Guide
|
||||
|
||||
> **Applies to** `StellaOps.Findings.Ledger` writer + projector services (Sprint 120).
|
||||
> **Audience** Platform/DevOps engineers bringing up Findings Ledger across dev/stage/prod and air-gapped sites.
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
| Component | Requirement |
|
||||
| --- | --- |
|
||||
| Database | PostgreSQL 14+ with `citext`, `uuid-ossp`, `pgcrypto`, and `pg_partman`. Provision dedicated database/user per environment. |
|
||||
| Storage | Minimum 200 GB SSD per production environment (ledger + projection + Merkle tables). |
|
||||
| TLS & identity | Authority reachable for service-to-service JWTs; mTLS optional but recommended. |
|
||||
| Secrets | Store DB connection string, encryption keys (`LEDGER__ATTACHMENTS__ENCRYPTIONKEY`), signing credentials for Merkle anchoring in secrets manager. |
|
||||
| Observability | OTLP collector endpoint (or Loki/Prometheus endpoints) configured; see `docs/modules/findings-ledger/observability.md`. |
|
||||
|
||||
## 2. Docker Compose deployment
|
||||
|
||||
1. **Create env files**
|
||||
```bash
|
||||
cp deploy/compose/env/ledger.env.example ledger.env
|
||||
cp etc/secrets/ledger.postgres.secret.example ledger.postgres.env
|
||||
# Populate LEDGER__DB__CONNECTIONSTRING, LEDGER__ATTACHMENTS__ENCRYPTIONKEY, etc.
|
||||
```
|
||||
2. **Add ledger service overlay** (append to the Compose file in use, e.g. `docker-compose.prod.yaml`):
|
||||
```yaml
|
||||
services:
|
||||
findings-ledger:
|
||||
image: stellaops/findings-ledger:${STELLA_VERSION:-2025.11.0}
|
||||
restart: unless-stopped
|
||||
env_file:
|
||||
- ledger.env
|
||||
- ledger.postgres.env
|
||||
environment:
|
||||
ASPNETCORE_URLS: http://0.0.0.0:8080
|
||||
LEDGER__DB__CONNECTIONSTRING: ${LEDGER__DB__CONNECTIONSTRING}
|
||||
LEDGER__OBSERVABILITY__ENABLED: "true"
|
||||
LEDGER__MERKLE__ANCHORINTERVAL: "00:05:00"
|
||||
ports:
|
||||
- "8188:8080"
|
||||
depends_on:
|
||||
- postgres
|
||||
volumes:
|
||||
- ./etc/ledger/appsettings.json:/app/appsettings.json:ro
|
||||
```
|
||||
3. **Run migrations then start services**
|
||||
```bash
|
||||
dotnet run --project src/Findings/StellaOps.Findings.Ledger.Migrations \
|
||||
-- --connection "$LEDGER__DB__CONNECTIONSTRING"
|
||||
|
||||
docker compose --env-file ledger.env --env-file ledger.postgres.env \
|
||||
-f deploy/compose/docker-compose.prod.yaml up -d findings-ledger
|
||||
```
|
||||
4. **Smoke test**
|
||||
```bash
|
||||
curl -sf http://localhost:8188/health/ready
|
||||
curl -sf http://localhost:8188/metrics | grep ledger_write_latency_seconds
|
||||
```
|
||||
|
||||
## 3. Helm deployment
|
||||
|
||||
1. **Create secret**
|
||||
```bash
|
||||
kubectl create secret generic findings-ledger-secrets \
|
||||
--from-literal=LEDGER__DB__CONNECTIONSTRING="$CONN_STRING" \
|
||||
--from-literal=LEDGER__ATTACHMENTS__ENCRYPTIONKEY="$ENC_KEY" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
```
|
||||
2. **Helm values excerpt**
|
||||
```yaml
|
||||
services:
|
||||
findingsLedger:
|
||||
enabled: true
|
||||
image:
|
||||
repository: stellaops/findings-ledger
|
||||
tag: 2025.11.0
|
||||
envFromSecrets:
|
||||
- name: findings-ledger-secrets
|
||||
env:
|
||||
LEDGER__OBSERVABILITY__ENABLED: "true"
|
||||
LEDGER__MERKLE__ANCHORINTERVAL: "00:05:00"
|
||||
resources:
|
||||
requests: { cpu: "500m", memory: "1Gi" }
|
||||
limits: { cpu: "2", memory: "4Gi" }
|
||||
probes:
|
||||
readinessPath: /health/ready
|
||||
livenessPath: /health/live
|
||||
```
|
||||
3. **Install/upgrade**
|
||||
```bash
|
||||
helm upgrade --install stellaops deploy/helm/stellaops \
|
||||
-f deploy/helm/stellaops/values-prod.yaml
|
||||
```
|
||||
4. **Verify**
|
||||
```bash
|
||||
kubectl logs deploy/stellaops-findings-ledger | grep "Ledger started"
|
||||
kubectl port-forward svc/stellaops-findings-ledger 8080 &
|
||||
curl -sf http://127.0.0.1:8080/metrics | head
|
||||
```
|
||||
|
||||
## 4. Backups & restores
|
||||
|
||||
| Task | Command / guidance |
|
||||
| --- | --- |
|
||||
| Online backup | `pg_dump -Fc --dbname="$LEDGER_DB" --file ledger-$(date -u +%Y%m%d).dump` (run hourly for WAL + daily full dumps). |
|
||||
| Point-in-time recovery | Enable WAL archiving; document target `recovery_target_time`. |
|
||||
| Projection rebuild | After restore, run `dotnet run --project tools/LedgerReplayHarness -- --connection "$LEDGER_DB" --tenant all` to regenerate projections and verify hashes. |
|
||||
| Evidence bundles | Store Merkle root anchors + replay DSSE bundles alongside DB backups for audit parity. |
|
||||
|
||||
## 5. Offline / air-gapped workflow
|
||||
|
||||
- Use `stella ledger observability snapshot --out offline/ledger/metrics.tar.gz` before exporting Offline Kits. Include:
|
||||
- `ledger_write_latency_seconds` summaries
|
||||
- `ledger_merkle_anchor_duration_seconds` histogram
|
||||
- Latest `ledger_merkle_roots` rows (export via `psql \copy`)
|
||||
- Package ledger service binaries + migrations using `ops/offline-kit/build_offline_kit.py --include ledger`.
|
||||
- Document sealed-mode restrictions: disable outbound attachments unless egress policy allows Evidence Locker endpoints; set `LEDGER__ATTACHMENTS__ALLOWEGRESS=false`.
|
||||
|
||||
## 6. Post-deploy checklist
|
||||
|
||||
- [ ] Health + metrics endpoints respond.
|
||||
- [ ] Merkle anchors writing to `ledger_merkle_roots`.
|
||||
- [ ] Projection lag < 30 s (`ledger_projection_lag_seconds`).
|
||||
- [ ] Grafana dashboards imported under “Findings Ledger”.
|
||||
- [ ] Backups scheduled + restore playbook tested.
|
||||
- [ ] Offline snapshot taken (air-gapped sites).
|
||||
|
||||
---
|
||||
|
||||
*Draft prepared 2025-11-13 for LEDGER-29-009/LEDGER-AIRGAP-56-001 planning. Update once Compose/Helm overlays are merged.*
|
||||
45
docs/modules/findings-ledger/implementation_plan.md
Normal file
45
docs/modules/findings-ledger/implementation_plan.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Implementation Plan — Findings Ledger (Sprint 120)
|
||||
|
||||
## Phase 1 – Observability baselines (LEDGER-29-007)
|
||||
- Instrument writer/projector with metrics listed in `observability.md` (`ledger_write_latency_seconds`, `ledger_events_total`, `ledger_projection_lag_seconds`, etc.).
|
||||
- Emit structured logs (Serilog JSON) including chain/sequence/hash metadata.
|
||||
- Wire OTLP exporters, ensure `/metrics` endpoint exposes histogram buckets with exemplars.
|
||||
- Publish Grafana dashboards + alert rules (Policy SLO pack).
|
||||
- Deliver doc updates + sample Grafana JSON in repo (`docs/observability/dashboards/findings-ledger/`).
|
||||
|
||||
## Phase 2 – Determinism harness (LEDGER-29-008)
|
||||
- Finalize NDJSON fixtures for ≥5 M findings/tenant (per tenant/test scenario).
|
||||
- Implement `tools/LedgerReplayHarness` CLI as specified in `replay-harness.md`.
|
||||
- Add GitHub/Gitea pipeline job(s) running nightly (1 M) + weekly (5 M) harness plus DSSE signing.
|
||||
- Capture CPU/memory/latency metrics and commit signed reports for validation.
|
||||
- Provide runbook for QA + Ops to rerun harness in their environments.
|
||||
|
||||
## Phase 3 – Deployment & backup collateral (LEDGER-29-009)
|
||||
- Integrate ledger service into Compose (`docker-compose.prod.yaml`) and Helm values.
|
||||
- Automate PostgreSQL migrations (DatabaseMigrator invocation pre-start).
|
||||
- Document backup cadence (pg_dump + WAL archiving) and projection rebuild process (call harness).
|
||||
- Ensure Offline Kit packaging pulls binaries, migrations, harness, and default dashboards.
|
||||
|
||||
## Phase 4 – Provenance & air-gap extensions
|
||||
- LEDGER-34-101: ingest orchestrator run export metadata, index by artifact hash, expose audit endpoint.
|
||||
- LEDGER-AIRGAP-56/57/58: extend ledger events to capture bundle provenance, staleness metrics, timeline events.
|
||||
- LEDGER-ATTEST-73-001: store attestation pointers (DSSE IDs, Rekor metadata) for explainability.
|
||||
- For each extension, update schema doc + workflow inference doc to describe newly recorded fields and tenant-safe defaults.
|
||||
|
||||
## Dependencies & sequencing
|
||||
1. AdvisoryAI Sprint 110.A completion (raw findings parity).
|
||||
2. Observability schema approval (Nov 15) to unblock Phase 1 instrumentation.
|
||||
3. QA lab capacity for 5 M replay (Nov 18 checkpoint).
|
||||
4. DevOps review of Compose/Helm overlays (Nov 20).
|
||||
5. Orchestrator export schema freeze (Nov 25) for provenance linkage.
|
||||
|
||||
## Deliverables checklist
|
||||
- [ ] Metrics/logging/tracing implementation merged, dashboards exported.
|
||||
- [ ] Harness CLI + fixtures + signed reports committed.
|
||||
- [ ] Compose/Helm overlays + backup/restore runbooks validated.
|
||||
- [ ] Air-gap provenance fields documented + implemented.
|
||||
- [ ] Sprint tracker and release notes updated after each phase.
|
||||
|
||||
---
|
||||
|
||||
*Draft: 2025-11-13. Update when sequencing or dependencies change.*
|
||||
65
docs/modules/findings-ledger/observability.md
Normal file
65
docs/modules/findings-ledger/observability.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Findings Ledger Observability Profile (Sprint 120)
|
||||
|
||||
> **Audience:** Findings Ledger Guild · Observability Guild · DevOps · AirGap Controller Guild
|
||||
> **Scope:** Metrics, logs, traces, dashboards, and alert contracts required by LEDGER-29-007/008/009. Complements the schema spec and workflow docs.
|
||||
|
||||
## 1. Telemetry stack & conventions
|
||||
- **Export path:** .NET OpenTelemetry SDK → OTLP → shared collector → Prometheus/Tempo/Loki. Enable via `observability.enabled=true` in `appsettings`.
|
||||
- **Namespace prefix:** `ledger.*` for metrics, `Ledger.*` for logs/traces. Labels follow `tenant`, `chain`, `policy`, `status`, `reason`, `anchor`.
|
||||
- **Time provenance:** All timestamps emitted in UTC ISO-8601. When metrics/logs include monotonic durations they must derive from `TimeProvider`.
|
||||
|
||||
## 2. Metrics
|
||||
|
||||
| Metric | Type | Labels | Description / target |
|
||||
| --- | --- | --- | --- |
|
||||
| `ledger_write_latency_seconds` | Histogram | `tenant`, `event_type` | End-to-end append latency (API ingress → persisted). P95 ≤ 120 ms. |
|
||||
| `ledger_events_total` | Counter | `tenant`, `event_type`, `source` (`policy`, `workflow`, `orchestrator`) | Incremented per committed event. Mirrors Merkle leaf count. |
|
||||
| `ledger_ingest_backlog_events` | Gauge | `tenant` | Number of events buffered in the writer queue. Alert when >5 000 for 5 min. |
|
||||
| `ledger_projection_lag_seconds` | Gauge | `tenant` | Wall-clock difference between latest ledger event and projection tail. Target <30 s. |
|
||||
| `ledger_projection_rebuild_seconds` | Histogram | `tenant` | Duration of replay/rebuild operations triggered by LEDGER-29-008 harness. |
|
||||
| `ledger_merkle_anchor_duration_seconds` | Histogram | `tenant` | Time to batch + anchor events. Target <60 s per 10k events. |
|
||||
| `ledger_merkle_anchor_failures_total` | Counter | `tenant`, `reason` (`db`, `signing`, `network`) | Alerts at >0 within 15 min. |
|
||||
| `ledger_attachments_encryption_failures_total` | Counter | `tenant`, `stage` (`encrypt`, `sign`, `upload`) | Ensures secure attachment pipeline stays healthy. |
|
||||
| `ledger_db_connections_active` | Gauge | `role` (`writer`, `projector`) | Helps tune pool size. |
|
||||
| `ledger_app_version_info` | Gauge | `version`, `git_sha` | Static metric for fleet observability. |
|
||||
|
||||
### Derived dashboards
|
||||
- **Writer health:** `ledger_write_latency_seconds` (P50/P95/P99), backlog gauge, event throughput.
|
||||
- **Projection health:** `ledger_projection_lag_seconds`, rebuild durations, conflict counts (from logs).
|
||||
- **Anchoring:** Anchor duration histogram, failure counter, root hash timeline.
|
||||
|
||||
## 3. Logs & traces
|
||||
- **Log structure:** Serilog JSON with fields `tenant`, `chainId`, `sequence`, `eventId`, `eventType`, `actorId`, `policyVersion`, `hash`, `merkleRoot`.
|
||||
- **Log levels:** `Information` for success summaries (sampled), `Warning` for retried operations, `Error` for failed writes/anchors.
|
||||
- **Correlation:** Each API request includes `requestId` + `traceId` logged with events. Projector logs capture `replayId` and `rebuildReason`.
|
||||
- **Secrets:** Ensure `event_body` is never logged; log only metadata/hashes.
|
||||
|
||||
## 4. Alerts
|
||||
|
||||
| Alert | Condition | Response |
|
||||
| --- | --- | --- |
|
||||
| **LedgerWriteSLA** | `ledger_write_latency_seconds` P95 > 0.12 s for 3 intervals | Check DB contention, review queue backlog, scale writer. |
|
||||
| **LedgerBacklogGrowing** | `ledger_ingest_backlog_events` > 5 000 for 5 min | Inspect upstream policy runs, ensure projector keeping up. |
|
||||
| **ProjectionLag** | `ledger_projection_lag_seconds` > 60 s | Trigger rebuild, verify change streams. |
|
||||
| **AnchorFailure** | `ledger_merkle_anchor_failures_total` increase > 0 | Collect logs, rerun anchor, verify signing service. |
|
||||
| **AttachmentSecurityError** | `ledger_attachments_encryption_failures_total` increase > 0 | Audit attachments pipeline; check key material and storage endpoints. |
|
||||
|
||||
Alerts integrate with Notifier channel `ledger.alerts`. For air-gapped deployments emit to local syslog + CLI incident scripts.
|
||||
|
||||
## 5. Testing & determinism harness
|
||||
- **Replay harness:** CLI `dotnet run --project tools/LedgerReplayHarness` executes deterministic replays at 5 M findings/tenant. Metrics emitted: `ledger_projection_rebuild_seconds` with `scenario` label.
|
||||
- **Property tests:** Seeded tests ensure `ledger_events_total` and Merkle leaf counts match after replay.
|
||||
- **CI gating:** `LEDGER-29-008` requires harness output uploaded as signed JSON (`harness-report.json` + DSSE) and referenced in sprint notes.
|
||||
|
||||
## 6. Offline & air-gap guidance
|
||||
- Collect metrics/log snapshots via `stella ledger observability snapshot --out offline/ledger/metrics.tar.gz`. Include `ledger_write_latency_seconds` summary, anchor root history, and projection lag samples.
|
||||
- Include default Grafana JSON under `offline/telemetry/dashboards/ledger/*.json`. Dashboards use the metrics above; filter by `tenant`.
|
||||
- Ensure sealed-mode doc (`docs/modules/findings-ledger/schema.md` §3.3) references `ledger_attachments_encryption_failures_total` so Ops can confirm encryption pipeline health without remote telemetry.
|
||||
|
||||
## 7. Runbook pointers
|
||||
- **Anchoring issues:** Refer to `docs/modules/findings-ledger/schema.md` §3 for root structure, `ops/devops/telemetry/package_offline_bundle.py` for diagnostics.
|
||||
- **Projection rebuilds:** `docs/modules/findings-ledger/workflow-inference.md` for chain rules; `scripts/ledger/replay.sh` (LEDGER-29-008 deliverable) for deterministic replays.
|
||||
|
||||
---
|
||||
|
||||
*Draft compiled 2025-11-13 for LEDGER-29-007/008 planning. Update when metrics or alerts change.*
|
||||
86
docs/modules/findings-ledger/replay-harness.md
Normal file
86
docs/modules/findings-ledger/replay-harness.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Findings Ledger Replay & Determinism Harness (LEDGER-29-008)
|
||||
|
||||
> **Audience:** Findings Ledger Guild · QA Guild · Policy Guild
|
||||
> **Purpose:** Define the reproducible harness for 5 M findings/tenant replay tests and determinism validation required by LEDGER-29-008.
|
||||
|
||||
## 1. Goals
|
||||
- Reproduce ledger + projection state from canonical event fixtures with byte-for-byte determinism.
|
||||
- Stress test writer/projector throughput at ≥5 M findings per tenant, capturing CPU/memory/latency profiles.
|
||||
- Produce signed reports (DSSE) that CI and auditors can review before shipping.
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
```
|
||||
Fixtures (.ndjson) → Harness Runner → Ledger Writer API → Postgres Ledger DB
|
||||
↘ Projector (same DB) ↘ Metrics snapshot
|
||||
```
|
||||
|
||||
- **Fixtures:** `fixtures/ledger/*.ndjson`, sorted by `sequence_no`, containing canonical JSON envelopes with precomputed hashes.
|
||||
- **Runner:** `tools/LedgerReplayHarness` (console app) feeds events, waits for projector catch-up, and verifies projection hashes.
|
||||
- **Validation:** After replay, the runner re-reads ledger/projection tables, recomputes hashes, and compares to fixture expectations.
|
||||
- **Reporting:** Generates `harness-report.json` with metrics (latency histogram, insertion throughput, projection lag) plus a DSSE signature.
|
||||
|
||||
## 3. CLI usage
|
||||
|
||||
```bash
|
||||
dotnet run --project tools/LedgerReplayHarness \
|
||||
-- --fixture fixtures/ledger/tenant-a.ndjson \
|
||||
--connection "Host=postgres;Username=stellaops;Password=***;Database=findings_ledger" \
|
||||
--tenant tenant-a \
|
||||
--maxParallel 8 \
|
||||
--report out/harness/tenant-a-report.json
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Option | Description |
|
||||
| --- | --- |
|
||||
| `--fixture` | Path to NDJSON file (supports multiple). |
|
||||
| `--connection` | Postgres connection string (writer + projector share). |
|
||||
| `--tenant` | Tenant identifier; harness ensures partitions exist. |
|
||||
| `--maxParallel` | Batch concurrency (default 4). |
|
||||
| `--report` | Output path for report JSON; `.sig` generated alongside. |
|
||||
| `--metrics-endpoint` | Optional Prometheus scrape URI for live metrics snapshot. |
|
||||
|
||||
## 4. Verification steps
|
||||
|
||||
1. **Hash validation:** Recompute `event_hash` for each appended event and ensure matches fixture.
|
||||
2. **Sequence integrity:** Confirm gapless sequences per chain; harness aborts on mismatch.
|
||||
3. **Projection determinism:** Compare projector-derived `cycle_hash` with expected value from fixture metadata.
|
||||
4. **Performance:** Capture P50/P95 latencies for `ledger_write_latency_seconds` and ensure targets (<120 ms P95) met.
|
||||
5. **Resource usage:** Sample CPU/memory via `dotnet-counters` or `kubectl top` and store in report.
|
||||
6. **Merkle root check:** Rebuild Merkle tree from events and ensure root equals database `ledger_merkle_roots` entry.
|
||||
|
||||
## 5. Output report schema
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant": "tenant-a",
|
||||
"fixtures": ["fixtures/ledger/tenant-a.ndjson"],
|
||||
"eventsWritten": 5123456,
|
||||
"durationSeconds": 1422.4,
|
||||
"latencyP95Ms": 108.3,
|
||||
"projectionLagMaxSeconds": 18.2,
|
||||
"cpuPercentMax": 72.5,
|
||||
"memoryMbMax": 3580,
|
||||
"merkleRoot": "3f1a…",
|
||||
"status": "pass",
|
||||
"timestamp": "2025-11-13T11:45:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
The harness writes `harness-report.json` plus `harness-report.json.sig` (DSSE) and `metrics-snapshot.prom` for archival.
|
||||
|
||||
## 6. CI integration
|
||||
- New pipeline job `ledger-replay-harness` runs nightly with reduced dataset (1 M findings) to detect regressions quickly.
|
||||
- Full 5 M run executes weekly and before releases; artifacts uploaded to `out/qa/findings-ledger/`.
|
||||
- Gates: merge blocked if harness `status != pass` or latencies exceed thresholds.
|
||||
|
||||
## 7. Air-gapped execution
|
||||
- Include fixtures + harness binaries inside Offline Kit under `offline/ledger/replay/`.
|
||||
- Provide `run-harness.sh` script that sets env vars, executes runner, and exports reports.
|
||||
- Operators attach signed reports to audit trails, verifying hashed fixtures before import.
|
||||
|
||||
---
|
||||
|
||||
*Draft prepared 2025-11-13 for LEDGER-29-008. Update when CLI options or thresholds change.*
|
||||
Reference in New Issue
Block a user