Files
git.stella-ops.org/docs/modules/findings-ledger/schema.md
master 2eb6852d34
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for SBOM ingestion and transformation
- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly.
- Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps.
- Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges.
- Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges.
- Set up project file for the test project with necessary dependencies and configurations.
- Include JSON fixture files for testing purposes.
2025-11-04 07:49:39 +02:00

12 KiB

Findings Ledger Schema (Sprint 120)

Owners: Findings Ledger Guild • Vuln Explorer Guild
Status: Draft schema delivered 2025-11-03 for LEDGER-29-001

1. Storage profile

Concern Decision Notes
Engine PostgreSQL 14+ with UTF-8, jsonb, and partitioning support Aligns with shared data plane; deterministic ordering enforced via primary keys.
Tenancy Range/list partition on tenant_id for ledger + projection tables Simplifies retention and cross-tenant anchoring.
Time zone All timestamps stored as timestamptz UTC Canonical JSON uses ISO-8601 (yyyy-MM-ddTHH:mm:ss.fffZ).
Hashing SHA-256 (lower-case hex) over canonical JSON Implemented client-side and verified by DB constraint.
Migrations SQL files under src/Findings/StellaOps.Findings.Ledger/migrations Applied via DatabaseMigrator (part of platform toolchain).

2. Ledger event model

Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata.

2.1 ledger_events

Column Type Description
tenant_id text Tenant partition key.
chain_id uuid Logical chain grouping (per tenant/policy combination).
sequence_no bigint Monotonic sequence within a chain (gapless).
event_id uuid Globally unique event identifier.
event_type ledger_event_type Enumerated type (see §2.2).
policy_version text Policy digest (e.g., SHA-256).
finding_id text Stable finding identity (artifactId + vulnId + policyVersion).
artifact_id text Asset identifier (image digest, SBOM id, etc.).
source_run_id uuid Policy run that produced the event (nullable).
actor_id text Operator/service initiating the mutation.
actor_type text system, operator, integration.
occurred_at timestamptz Domain timestamp supplied by source.
recorded_at timestamptz Ingestion timestamp (defaults to now()).
event_body jsonb Canonical payload (see §2.3).
event_hash char(64) SHA-256 over canonical payload envelope.
previous_hash char(64) Hash of prior event in chain (all zeroes for first).
merkle_leaf_hash char(64) Leaf hash used for Merkle anchoring (hash over `event_hash

Constraints & indexes

PRIMARY KEY (tenant_id, chain_id, sequence_no);
UNIQUE (tenant_id, event_id);
UNIQUE (tenant_id, chain_id, event_hash);
CHECK (event_hash ~ '^[0-9a-f]{64}$');
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);

Partitions: top-level partitioned by tenant_id (list) with a default partition. Optional sub-partition by month on recorded_at for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for event_id is enforced as (tenant_id, event_id) with application-level guards maintaining cross-tenant uniqueness.

2.2 Event types

CREATE TYPE ledger_event_type AS ENUM (
  'finding.created',
  'finding.status_changed',
  'finding.severity_changed',
  'finding.tag_updated',
  'finding.comment_added',
  'finding.assignment_changed',
  'finding.accepted_risk',
  'finding.remediation_plan_added',
  'finding.attachment_added',
  'finding.closed'
);

Additional types can be appended via migrations; canonical JSON must include event_type key.

2.3 Canonical ledger JSON

Canonical payload envelope (before hashing):

{
  "event": {
    "id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10",
    "type": "finding.status_changed",
    "tenant": "tenant-a",
    "chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc",
    "sequence": 42,
    "policyVersion": "sha256:5f38...",
    "finding": {
      "id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product",
      "artifactId": "sha256:abc",
      "vulnId": "CVE-2025-1234"
    },
    "actor": {
      "id": "user:alice@tenant",
      "type": "operator"
    },
    "occurredAt": "2025-11-03T15:12:05.123Z",
    "payload": {
      "previousStatus": "affected",
      "status": "triaged",
      "justification": "Ticket SEC-1234 created",
      "ticket": {
        "id": "SEC-1234",
        "url": "https://tracker/sec-1234"
      }
    }
  }
}

Canonicalisation rules:

  1. Serialize using UTF-8, no BOM.
  2. Sort object keys lexicographically at every level.
  3. Represent enums/flags as lower-case strings.
  4. Timestamps formatted as yyyy-MM-ddTHH:mm:ss.fffZ (millisecond precision, UTC).
  5. Numbers use decimal notation; omit trailing zeros.
  6. Arrays maintain supplied order.

Hash pipeline:

canonical_json = CanonicalJsonSerializer.Serialize(envelope)
sha256_bytes = SHA256(canonical_json)
event_hash = HexLower(sha256_bytes)

merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).

3. Merkle anchoring

Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in ledger_merkle_roots.

Column Type Description
tenant_id text Tenant key.
anchor_id uuid Anchor identifier.
window_start timestamptz Inclusive start of batch.
window_end timestamptz Exclusive end.
sequence_start bigint First sequence included.
sequence_end bigint Last sequence included.
root_hash char(64) Merkle root (SHA-256).
leaf_count integer Number of events aggregated.
anchored_at timestamptz Timestamp root stored/signed.
anchor_reference text Optional reference to external ledger (e.g., Rekor UUID).

Indexes: PRIMARY KEY (tenant_id, anchor_id), UNIQUE (tenant_id, root_hash), INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC).

4. Projection tables

4.1 findings_projection

Stores the latest verdict/state per finding.

Column Type Description
tenant_id text Partition key.
finding_id text Matches ledger payload.
policy_version text Active policy digest.
status text e.g., affected, triaged, accepted_risk, resolved.
severity numeric(6,3) Normalised severity score (0-10).
labels jsonb Key-value metadata (tags, KEV flag, runtime signals).
current_event_id uuid Ledger event that produced this state.
explain_ref text Reference to explain bundle or object storage key.
policy_rationale jsonb Array of policy rationale references (explain bundle IDs, remediation notes).
updated_at timestamptz Last projection update.
cycle_hash char(64) Deterministic hash of projection record (used in export bundles).

Primary key: (tenant_id, finding_id, policy_version).

Indexes:

  • ix_projection_status on (tenant_id, status, severity DESC).
  • ix_projection_labels_gin using labels GIN for KEV/runtime filters.

4.2 finding_history

Delta view derived from ledger events for quick UI queries.

Column Type Description
tenant_id text Partition key.
finding_id text Finding identity.
policy_version text Policy digest.
event_id uuid Ledger event ID.
status text Status after event.
severity numeric(6,3) Severity after event (nullable).
actor_id text Actor performing change.
comment text Optional summary/message.
occurred_at timestamptz Domain event timestamp.

Materialized view or table updated by projector. Indexed by (tenant_id, finding_id, occurred_at DESC).

4.3 triage_actions

Audit table for operator actions needing tailored queries.

Column Type Description
tenant_id text Partition key.
action_id uuid Primary key.
event_id uuid Source ledger event.
finding_id text Finding identity.
action_type ledger_action_type e.g., assign, comment, attach_evidence, link_ticket.
payload jsonb Structured action body (canonical stored separately).
created_at timestamptz Timestamp stored.
created_by text Actor ID.

ledger_action_type enum mirrors CLI/UX operations.

CREATE TYPE ledger_action_type AS ENUM (
  'assign',
  'comment',
  'attach_evidence',
  'link_ticket',
  'remediation_plan',
  'status_change',
  'accept_risk',
  'reopen',
  'close'
);

### 4.4 `ledger_projection_offsets`

Checkpoint store for the projection background worker. Ensures idempotent replays across restarts.

| Column | Type | Description |
|--------|------|-------------|
| `worker_id` | `text` | Logical worker identifier (defaults to `default`). |
| `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. |
| `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. |
| `updated_at` | `timestamptz` | Last time the checkpoint was persisted. |

Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID).

## 5. Hashing & verification

1. Canonical serialize the envelope (§2.3).
2. Compute `event_hash` and store along with `previous_hash`.
3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`.
4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`.
5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys.

Verification flow for auditors:

- Fetch event, recompute canonical hash, validate `previous_hash` chain.
- Reconstruct Merkle path from stored leaf hash; verify matches recorded root.
- Cross-check projection `cycle_hash` matches ledger state derived from last event.

## 6. Fixtures & migrations

- Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`.
- Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values).
- Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation).

Fixtures follow canonical key ordering and include precomputed hashes to validate tooling.

## 7. Projection worker

- `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`.
- Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time.
- Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`).
- Each event writes:
  - `findings_projection` (upserted current state with `cycle_hash`).
  - `finding_history` (timeline entry keyed by event ID).
  - `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close).

## 8. Next steps

- Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`).
- Align Vulnerability Explorer queries with the new projection state and timeline endpoints.
- Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised.
|  |  | Array of policy rationale references (explain bundle IDs, remediation notes). |