# Findings Ledger Schema (Sprint 120) > **Owners:** Findings Ledger Guild • Vuln Explorer Guild > **Status:** Draft schema delivered 2025-11-03 for LEDGER-29-001 ## 1. Storage profile | Concern | Decision | Notes | |---------|----------|-------| | Engine | PostgreSQL 14+ with UTF-8, `jsonb`, and partitioning support | Aligns with shared data plane; deterministic ordering enforced via primary keys. | | Tenancy | Range/list partition on `tenant_id` for ledger + projection tables | Simplifies retention and cross-tenant anchoring. | | Time zone | All timestamps stored as `timestamptz` UTC | Canonical JSON uses ISO-8601 (`yyyy-MM-ddTHH:mm:ss.fffZ`). | | Hashing | SHA-256 (lower-case hex) over canonical JSON | Implemented client-side and verified by DB constraint. | | Migrations | SQL files under `src/Findings/StellaOps.Findings.Ledger/migrations` | Applied via DatabaseMigrator (part of platform toolchain). | ## 2. Ledger event model Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata. ### 2.1 `ledger_events` | Column | Type | Description | |--------|------|-------------| | `tenant_id` | `text` | Tenant partition key. | | `chain_id` | `uuid` | Logical chain grouping (per tenant/policy combination). | | `sequence_no` | `bigint` | Monotonic sequence within a chain (gapless). | | `event_id` | `uuid` | Globally unique event identifier. | | `event_type` | `ledger_event_type` | Enumerated type (see §2.2). | | `policy_version` | `text` | Policy digest (e.g., SHA-256). | | `finding_id` | `text` | Stable finding identity `(artifactId + vulnId + policyVersion)`. | | `artifact_id` | `text` | Asset identifier (image digest, SBOM id, etc.). | | `source_run_id` | `uuid` | Policy run that produced the event (nullable). | | `actor_id` | `text` | Operator/service initiating the mutation. | | `actor_type` | `text` | `system`, `operator`, `integration`. | | `occurred_at` | `timestamptz` | Domain timestamp supplied by source. | | `recorded_at` | `timestamptz` | Ingestion timestamp (defaults to `now()`). | | `event_body` | `jsonb` | Canonical payload (see §2.3). | | `event_hash` | `char(64)` | SHA-256 over canonical payload envelope. | | `previous_hash` | `char(64)` | Hash of prior event in chain (all zeroes for first). | | `merkle_leaf_hash` | `char(64)` | Leaf hash used for Merkle anchoring (hash over `event_hash || sequence_no`). | **Constraints & indexes** ``` PRIMARY KEY (tenant_id, chain_id, sequence_no); UNIQUE (tenant_id, event_id); UNIQUE (tenant_id, chain_id, event_hash); CHECK (event_hash ~ '^[0-9a-f]{64}$'); CHECK (previous_hash ~ '^[0-9a-f]{64}$'); CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version); CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC); ``` Partitions: top-level partitioned by `tenant_id` (list) with a default partition. Optional sub-partition by month on `recorded_at` for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for `event_id` is enforced as `(tenant_id, event_id)` with application-level guards maintaining cross-tenant uniqueness. ### 2.2 Event types ``` CREATE TYPE ledger_event_type AS ENUM ( 'finding.created', 'finding.status_changed', 'finding.severity_changed', 'finding.tag_updated', 'finding.comment_added', 'finding.assignment_changed', 'finding.accepted_risk', 'finding.remediation_plan_added', 'finding.attachment_added', 'finding.closed' ); ``` Additional types can be appended via migrations; canonical JSON must include `event_type` key. ### 2.3 Canonical ledger JSON Canonical payload envelope (before hashing): ```json { "event": { "id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10", "type": "finding.status_changed", "tenant": "tenant-a", "chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc", "sequence": 42, "policyVersion": "sha256:5f38...", "finding": { "id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product", "artifactId": "sha256:abc", "vulnId": "CVE-2025-1234" }, "actor": { "id": "user:alice@tenant", "type": "operator" }, "occurredAt": "2025-11-03T15:12:05.123Z", "payload": { "previousStatus": "affected", "status": "triaged", "justification": "Ticket SEC-1234 created", "ticket": { "id": "SEC-1234", "url": "https://tracker/sec-1234" } } } } ``` Canonicalisation rules: 1. Serialize using UTF-8, no BOM. 2. Sort object keys lexicographically at every level. 3. Represent enums/flags as lower-case strings. 4. Timestamps formatted as `yyyy-MM-ddTHH:mm:ss.fffZ` (millisecond precision, UTC). 5. Numbers use decimal notation; omit trailing zeros. 6. Arrays maintain supplied order. Hash pipeline: ``` canonical_json = CanonicalJsonSerializer.Serialize(envelope) sha256_bytes = SHA256(canonical_json) event_hash = HexLower(sha256_bytes) ``` `merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).` ## 3. Merkle anchoring Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in `ledger_merkle_roots`. | Column | Type | Description | |--------|------|-------------| | `tenant_id` | `text` | Tenant key. | | `anchor_id` | `uuid` | Anchor identifier. | | `window_start` | `timestamptz` | Inclusive start of batch. | | `window_end` | `timestamptz` | Exclusive end. | | `sequence_start` | `bigint` | First sequence included. | | `sequence_end` | `bigint` | Last sequence included. | | `root_hash` | `char(64)` | Merkle root (SHA-256). | | `leaf_count` | `integer` | Number of events aggregated. | | `anchored_at` | `timestamptz` | Timestamp root stored/signed. | | `anchor_reference` | `text` | Optional reference to external ledger (e.g., Rekor UUID). | Indexes: `PRIMARY KEY (tenant_id, anchor_id)`, `UNIQUE (tenant_id, root_hash)`, `INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC)`. ## 4. Projection tables ### 4.1 `findings_projection` Stores the latest verdict/state per finding. | Column | Type | Description | |--------|------|-------------| | `tenant_id` | `text` | Partition key. | | `finding_id` | `text` | Matches ledger payload. | | `policy_version` | `text` | Active policy digest. | | `status` | `text` | e.g., `affected`, `triaged`, `accepted_risk`, `resolved`. | | `severity` | `numeric(6,3)` | Normalised severity score (0-10). | | `labels` | `jsonb` | Key-value metadata (tags, KEV flag, runtime signals). | | `current_event_id` | `uuid` | Ledger event that produced this state. | | `explain_ref` | `text` | Reference to explain bundle or object storage key. | | `policy_rationale` | `jsonb` | Array of policy rationale references (explain bundle IDs, remediation notes). | | `updated_at` | `timestamptz` | Last projection update. | | `cycle_hash` | `char(64)` | Deterministic hash of projection record (used in export bundles). | Primary key: `(tenant_id, finding_id, policy_version)`. Indexes: - `ix_projection_status` on `(tenant_id, status, severity DESC)`. - `ix_projection_labels_gin` using `labels` GIN for KEV/runtime filters. ### 4.2 `finding_history` Delta view derived from ledger events for quick UI queries. | Column | Type | Description | |--------|------|-------------| | `tenant_id` | `text` | Partition key. | | `finding_id` | `text` | Finding identity. | | `policy_version` | `text` | Policy digest. | | `event_id` | `uuid` | Ledger event ID. | | `status` | `text` | Status after event. | | `severity` | `numeric(6,3)` | Severity after event (nullable). | | `actor_id` | `text` | Actor performing change. | | `comment` | `text` | Optional summary/message. | | `occurred_at` | `timestamptz` | Domain event timestamp. | Materialized view or table updated by projector. Indexed by `(tenant_id, finding_id, occurred_at DESC)`. ### 4.3 `triage_actions` Audit table for operator actions needing tailored queries. | Column | Type | Description | |--------|------|-------------| | `tenant_id` | `text` | Partition key. | | `action_id` | `uuid` | Primary key. | | `event_id` | `uuid` | Source ledger event. | | `finding_id` | `text` | Finding identity. | | `action_type` | `ledger_action_type` | e.g., `assign`, `comment`, `attach_evidence`, `link_ticket`. | | `payload` | `jsonb` | Structured action body (canonical stored separately). | | `created_at` | `timestamptz` | Timestamp stored. | | `created_by` | `text` | Actor ID. | `ledger_action_type` enum mirrors CLI/UX operations. ``` CREATE TYPE ledger_action_type AS ENUM ( 'assign', 'comment', 'attach_evidence', 'link_ticket', 'remediation_plan', 'status_change', 'accept_risk', 'reopen', 'close' ); ### 4.4 `ledger_projection_offsets` Checkpoint store for the projection background worker. Ensures idempotent replays across restarts. | Column | Type | Description | |--------|------|-------------| | `worker_id` | `text` | Logical worker identifier (defaults to `default`). | | `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. | | `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. | | `updated_at` | `timestamptz` | Last time the checkpoint was persisted. | Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID). ## 5. Hashing & verification 1. Canonical serialize the envelope (§2.3). 2. Compute `event_hash` and store along with `previous_hash`. 3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`. 4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`. 5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys. Verification flow for auditors: - Fetch event, recompute canonical hash, validate `previous_hash` chain. - Reconstruct Merkle path from stored leaf hash; verify matches recorded root. - Cross-check projection `cycle_hash` matches ledger state derived from last event. ## 6. Fixtures & migrations - Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`. - Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values). - Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation). Fixtures follow canonical key ordering and include precomputed hashes to validate tooling. ## 7. Projection worker - `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`. - Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time. - Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`). - Each event writes: - `findings_projection` (upserted current state with `cycle_hash`). - `finding_history` (timeline entry keyed by event ID). - `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close). ## 8. Next steps - Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`). - Align Vulnerability Explorer queries with the new projection state and timeline endpoints. - Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised. | | | Array of policy rationale references (explain bundle IDs, remediation notes). |