- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly. - Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps. - Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges. - Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges. - Set up project file for the test project with necessary dependencies and configurations. - Include JSON fixture files for testing purposes.
12 KiB
Findings Ledger Schema (Sprint 120)
Owners: Findings Ledger Guild • Vuln Explorer Guild
Status: Draft schema delivered 2025-11-03 for LEDGER-29-001
1. Storage profile
| Concern | Decision | Notes |
|---|---|---|
| Engine | PostgreSQL 14+ with UTF-8, jsonb, and partitioning support |
Aligns with shared data plane; deterministic ordering enforced via primary keys. |
| Tenancy | Range/list partition on tenant_id for ledger + projection tables |
Simplifies retention and cross-tenant anchoring. |
| Time zone | All timestamps stored as timestamptz UTC |
Canonical JSON uses ISO-8601 (yyyy-MM-ddTHH:mm:ss.fffZ). |
| Hashing | SHA-256 (lower-case hex) over canonical JSON | Implemented client-side and verified by DB constraint. |
| Migrations | SQL files under src/Findings/StellaOps.Findings.Ledger/migrations |
Applied via DatabaseMigrator (part of platform toolchain). |
2. Ledger event model
Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata.
2.1 ledger_events
| Column | Type | Description |
|---|---|---|
tenant_id |
text |
Tenant partition key. |
chain_id |
uuid |
Logical chain grouping (per tenant/policy combination). |
sequence_no |
bigint |
Monotonic sequence within a chain (gapless). |
event_id |
uuid |
Globally unique event identifier. |
event_type |
ledger_event_type |
Enumerated type (see §2.2). |
policy_version |
text |
Policy digest (e.g., SHA-256). |
finding_id |
text |
Stable finding identity (artifactId + vulnId + policyVersion). |
artifact_id |
text |
Asset identifier (image digest, SBOM id, etc.). |
source_run_id |
uuid |
Policy run that produced the event (nullable). |
actor_id |
text |
Operator/service initiating the mutation. |
actor_type |
text |
system, operator, integration. |
occurred_at |
timestamptz |
Domain timestamp supplied by source. |
recorded_at |
timestamptz |
Ingestion timestamp (defaults to now()). |
event_body |
jsonb |
Canonical payload (see §2.3). |
event_hash |
char(64) |
SHA-256 over canonical payload envelope. |
previous_hash |
char(64) |
Hash of prior event in chain (all zeroes for first). |
merkle_leaf_hash |
char(64) |
Leaf hash used for Merkle anchoring (hash over `event_hash |
Constraints & indexes
PRIMARY KEY (tenant_id, chain_id, sequence_no);
UNIQUE (tenant_id, event_id);
UNIQUE (tenant_id, chain_id, event_hash);
CHECK (event_hash ~ '^[0-9a-f]{64}$');
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);
Partitions: top-level partitioned by tenant_id (list) with a default partition. Optional sub-partition by month on recorded_at for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for event_id is enforced as (tenant_id, event_id) with application-level guards maintaining cross-tenant uniqueness.
2.2 Event types
CREATE TYPE ledger_event_type AS ENUM (
'finding.created',
'finding.status_changed',
'finding.severity_changed',
'finding.tag_updated',
'finding.comment_added',
'finding.assignment_changed',
'finding.accepted_risk',
'finding.remediation_plan_added',
'finding.attachment_added',
'finding.closed'
);
Additional types can be appended via migrations; canonical JSON must include event_type key.
2.3 Canonical ledger JSON
Canonical payload envelope (before hashing):
{
"event": {
"id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10",
"type": "finding.status_changed",
"tenant": "tenant-a",
"chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc",
"sequence": 42,
"policyVersion": "sha256:5f38...",
"finding": {
"id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product",
"artifactId": "sha256:abc",
"vulnId": "CVE-2025-1234"
},
"actor": {
"id": "user:alice@tenant",
"type": "operator"
},
"occurredAt": "2025-11-03T15:12:05.123Z",
"payload": {
"previousStatus": "affected",
"status": "triaged",
"justification": "Ticket SEC-1234 created",
"ticket": {
"id": "SEC-1234",
"url": "https://tracker/sec-1234"
}
}
}
}
Canonicalisation rules:
- Serialize using UTF-8, no BOM.
- Sort object keys lexicographically at every level.
- Represent enums/flags as lower-case strings.
- Timestamps formatted as
yyyy-MM-ddTHH:mm:ss.fffZ(millisecond precision, UTC). - Numbers use decimal notation; omit trailing zeros.
- Arrays maintain supplied order.
Hash pipeline:
canonical_json = CanonicalJsonSerializer.Serialize(envelope)
sha256_bytes = SHA256(canonical_json)
event_hash = HexLower(sha256_bytes)
merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).
3. Merkle anchoring
Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in ledger_merkle_roots.
| Column | Type | Description |
|---|---|---|
tenant_id |
text |
Tenant key. |
anchor_id |
uuid |
Anchor identifier. |
window_start |
timestamptz |
Inclusive start of batch. |
window_end |
timestamptz |
Exclusive end. |
sequence_start |
bigint |
First sequence included. |
sequence_end |
bigint |
Last sequence included. |
root_hash |
char(64) |
Merkle root (SHA-256). |
leaf_count |
integer |
Number of events aggregated. |
anchored_at |
timestamptz |
Timestamp root stored/signed. |
anchor_reference |
text |
Optional reference to external ledger (e.g., Rekor UUID). |
Indexes: PRIMARY KEY (tenant_id, anchor_id), UNIQUE (tenant_id, root_hash), INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC).
4. Projection tables
4.1 findings_projection
Stores the latest verdict/state per finding.
| Column | Type | Description |
|---|---|---|
tenant_id |
text |
Partition key. |
finding_id |
text |
Matches ledger payload. |
policy_version |
text |
Active policy digest. |
status |
text |
e.g., affected, triaged, accepted_risk, resolved. |
severity |
numeric(6,3) |
Normalised severity score (0-10). |
labels |
jsonb |
Key-value metadata (tags, KEV flag, runtime signals). |
current_event_id |
uuid |
Ledger event that produced this state. |
explain_ref |
text |
Reference to explain bundle or object storage key. |
policy_rationale |
jsonb |
Array of policy rationale references (explain bundle IDs, remediation notes). |
updated_at |
timestamptz |
Last projection update. |
cycle_hash |
char(64) |
Deterministic hash of projection record (used in export bundles). |
Primary key: (tenant_id, finding_id, policy_version).
Indexes:
ix_projection_statuson(tenant_id, status, severity DESC).ix_projection_labels_ginusinglabelsGIN for KEV/runtime filters.
4.2 finding_history
Delta view derived from ledger events for quick UI queries.
| Column | Type | Description |
|---|---|---|
tenant_id |
text |
Partition key. |
finding_id |
text |
Finding identity. |
policy_version |
text |
Policy digest. |
event_id |
uuid |
Ledger event ID. |
status |
text |
Status after event. |
severity |
numeric(6,3) |
Severity after event (nullable). |
actor_id |
text |
Actor performing change. |
comment |
text |
Optional summary/message. |
occurred_at |
timestamptz |
Domain event timestamp. |
Materialized view or table updated by projector. Indexed by (tenant_id, finding_id, occurred_at DESC).
4.3 triage_actions
Audit table for operator actions needing tailored queries.
| Column | Type | Description |
|---|---|---|
tenant_id |
text |
Partition key. |
action_id |
uuid |
Primary key. |
event_id |
uuid |
Source ledger event. |
finding_id |
text |
Finding identity. |
action_type |
ledger_action_type |
e.g., assign, comment, attach_evidence, link_ticket. |
payload |
jsonb |
Structured action body (canonical stored separately). |
created_at |
timestamptz |
Timestamp stored. |
created_by |
text |
Actor ID. |
ledger_action_type enum mirrors CLI/UX operations.
CREATE TYPE ledger_action_type AS ENUM (
'assign',
'comment',
'attach_evidence',
'link_ticket',
'remediation_plan',
'status_change',
'accept_risk',
'reopen',
'close'
);
### 4.4 `ledger_projection_offsets`
Checkpoint store for the projection background worker. Ensures idempotent replays across restarts.
| Column | Type | Description |
|--------|------|-------------|
| `worker_id` | `text` | Logical worker identifier (defaults to `default`). |
| `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. |
| `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. |
| `updated_at` | `timestamptz` | Last time the checkpoint was persisted. |
Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID).
## 5. Hashing & verification
1. Canonical serialize the envelope (§2.3).
2. Compute `event_hash` and store along with `previous_hash`.
3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`.
4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`.
5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys.
Verification flow for auditors:
- Fetch event, recompute canonical hash, validate `previous_hash` chain.
- Reconstruct Merkle path from stored leaf hash; verify matches recorded root.
- Cross-check projection `cycle_hash` matches ledger state derived from last event.
## 6. Fixtures & migrations
- Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`.
- Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values).
- Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation).
Fixtures follow canonical key ordering and include precomputed hashes to validate tooling.
## 7. Projection worker
- `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`.
- Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time.
- Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`).
- Each event writes:
- `findings_projection` (upserted current state with `cycle_hash`).
- `finding_history` (timeline entry keyed by event ID).
- `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close).
## 8. Next steps
- Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`).
- Align Vulnerability Explorer queries with the new projection state and timeline endpoints.
- Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised.
| | | Array of policy rationale references (explain bundle IDs, remediation notes). |