Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

Add unit tests for SBOM ingestion and transformation

- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly.
- Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps.
- Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges.
- Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges.
- Set up project file for the test project with necessary dependencies and configurations.
- Include JSON fixture files for testing purposes.

2025-11-04 07:49:39 +02:00

12 KiB

Raw Blame History

Findings Ledger Schema (Sprint 120)

Owners: Findings Ledger Guild • Vuln Explorer Guild
Status: Draft schema delivered 2025-11-03 for LEDGER-29-001

1. Storage profile

Concern	Decision	Notes
Engine	PostgreSQL 14+ with UTF-8, `jsonb`, and partitioning support	Aligns with shared data plane; deterministic ordering enforced via primary keys.
Tenancy	Range/list partition on `tenant_id` for ledger + projection tables	Simplifies retention and cross-tenant anchoring.
Time zone	All timestamps stored as `timestamptz` UTC	Canonical JSON uses ISO-8601 (`yyyy-MM-ddTHH:mm:ss.fffZ`).
Hashing	SHA-256 (lower-case hex) over canonical JSON	Implemented client-side and verified by DB constraint.
Migrations	SQL files under `src/Findings/StellaOps.Findings.Ledger/migrations`	Applied via DatabaseMigrator (part of platform toolchain).

2. Ledger event model

Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata.

2.1 `ledger_events`

Column	Type	Description
`tenant_id`	`text`	Tenant partition key.
`chain_id`	`uuid`	Logical chain grouping (per tenant/policy combination).
`sequence_no`	`bigint`	Monotonic sequence within a chain (gapless).
`event_id`	`uuid`	Globally unique event identifier.
`event_type`	`ledger_event_type`	Enumerated type (see §2.2).
`policy_version`	`text`	Policy digest (e.g., SHA-256).
`finding_id`	`text`	Stable finding identity `(artifactId + vulnId + policyVersion)`.
`artifact_id`	`text`	Asset identifier (image digest, SBOM id, etc.).
`source_run_id`	`uuid`	Policy run that produced the event (nullable).
`actor_id`	`text`	Operator/service initiating the mutation.
`actor_type`	`text`	`system`, `operator`, `integration`.
`occurred_at`	`timestamptz`	Domain timestamp supplied by source.
`recorded_at`	`timestamptz`	Ingestion timestamp (defaults to `now()`).
`event_body`	`jsonb`	Canonical payload (see §2.3).
`event_hash`	`char(64)`	SHA-256 over canonical payload envelope.
`previous_hash`	`char(64)`	Hash of prior event in chain (all zeroes for first).
`merkle_leaf_hash`	`char(64)`	Leaf hash used for Merkle anchoring (hash over `event_hash

Constraints & indexes

PRIMARY KEY (tenant_id, chain_id, sequence_no);
UNIQUE (tenant_id, event_id);
UNIQUE (tenant_id, chain_id, event_hash);
CHECK (event_hash ~ '^[0-9a-f]{64}$');
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);

Partitions: top-level partitioned by tenant_id (list) with a default partition. Optional sub-partition by month on recorded_at for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for event_id is enforced as (tenant_id, event_id) with application-level guards maintaining cross-tenant uniqueness.

2.2 Event types

CREATE TYPE ledger_event_type AS ENUM (
  'finding.created',
  'finding.status_changed',
  'finding.severity_changed',
  'finding.tag_updated',
  'finding.comment_added',
  'finding.assignment_changed',
  'finding.accepted_risk',
  'finding.remediation_plan_added',
  'finding.attachment_added',
  'finding.closed'
);

Additional types can be appended via migrations; canonical JSON must include event_type key.

2.3 Canonical ledger JSON

Canonical payload envelope (before hashing):

{
  "event": {
    "id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10",
    "type": "finding.status_changed",
    "tenant": "tenant-a",
    "chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc",
    "sequence": 42,
    "policyVersion": "sha256:5f38...",
    "finding": {
      "id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product",
      "artifactId": "sha256:abc",
      "vulnId": "CVE-2025-1234"
    },
    "actor": {
      "id": "user:alice@tenant",
      "type": "operator"
    },
    "occurredAt": "2025-11-03T15:12:05.123Z",
    "payload": {
      "previousStatus": "affected",
      "status": "triaged",
      "justification": "Ticket SEC-1234 created",
      "ticket": {
        "id": "SEC-1234",
        "url": "https://tracker/sec-1234"
      }
    }
  }
}

Canonicalisation rules:

Serialize using UTF-8, no BOM.
Sort object keys lexicographically at every level.
Represent enums/flags as lower-case strings.
Timestamps formatted as yyyy-MM-ddTHH:mm:ss.fffZ (millisecond precision, UTC).
Numbers use decimal notation; omit trailing zeros.
Arrays maintain supplied order.

Hash pipeline:

canonical_json = CanonicalJsonSerializer.Serialize(envelope)
sha256_bytes = SHA256(canonical_json)
event_hash = HexLower(sha256_bytes)

merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).

3. Merkle anchoring

Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in ledger_merkle_roots.

Column	Type	Description
`tenant_id`	`text`	Tenant key.
`anchor_id`	`uuid`	Anchor identifier.
`window_start`	`timestamptz`	Inclusive start of batch.
`window_end`	`timestamptz`	Exclusive end.
`sequence_start`	`bigint`	First sequence included.
`sequence_end`	`bigint`	Last sequence included.
`root_hash`	`char(64)`	Merkle root (SHA-256).
`leaf_count`	`integer`	Number of events aggregated.
`anchored_at`	`timestamptz`	Timestamp root stored/signed.
`anchor_reference`	`text`	Optional reference to external ledger (e.g., Rekor UUID).

Indexes: PRIMARY KEY (tenant_id, anchor_id), UNIQUE (tenant_id, root_hash), INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC).

4. Projection tables

4.1 `findings_projection`

Stores the latest verdict/state per finding.

Column	Type	Description
`tenant_id`	`text`	Partition key.
`finding_id`	`text`	Matches ledger payload.
`policy_version`	`text`	Active policy digest.
`status`	`text`	e.g., `affected`, `triaged`, `accepted_risk`, `resolved`.
`severity`	`numeric(6,3)`	Normalised severity score (0-10).
`labels`	`jsonb`	Key-value metadata (tags, KEV flag, runtime signals).
`current_event_id`	`uuid`	Ledger event that produced this state.
`explain_ref`	`text`	Reference to explain bundle or object storage key.
`policy_rationale`	`jsonb`	Array of policy rationale references (explain bundle IDs, remediation notes).
`updated_at`	`timestamptz`	Last projection update.
`cycle_hash`	`char(64)`	Deterministic hash of projection record (used in export bundles).

Primary key: (tenant_id, finding_id, policy_version).

Indexes:

ix_projection_status on (tenant_id, status, severity DESC).
ix_projection_labels_gin using labels GIN for KEV/runtime filters.

4.2 `finding_history`

Delta view derived from ledger events for quick UI queries.

Column	Type	Description
`tenant_id`	`text`	Partition key.
`finding_id`	`text`	Finding identity.
`policy_version`	`text`	Policy digest.
`event_id`	`uuid`	Ledger event ID.
`status`	`text`	Status after event.
`severity`	`numeric(6,3)`	Severity after event (nullable).
`actor_id`	`text`	Actor performing change.
`comment`	`text`	Optional summary/message.
`occurred_at`	`timestamptz`	Domain event timestamp.

Materialized view or table updated by projector. Indexed by (tenant_id, finding_id, occurred_at DESC).

4.3 `triage_actions`

Audit table for operator actions needing tailored queries.

Column	Type	Description
`tenant_id`	`text`	Partition key.
`action_id`	`uuid`	Primary key.
`event_id`	`uuid`	Source ledger event.
`finding_id`	`text`	Finding identity.
`action_type`	`ledger_action_type`	e.g., `assign`, `comment`, `attach_evidence`, `link_ticket`.
`payload`	`jsonb`	Structured action body (canonical stored separately).
`created_at`	`timestamptz`	Timestamp stored.
`created_by`	`text`	Actor ID.

ledger_action_type enum mirrors CLI/UX operations.

CREATE TYPE ledger_action_type AS ENUM (
  'assign',
  'comment',
  'attach_evidence',
  'link_ticket',
  'remediation_plan',
  'status_change',
  'accept_risk',
  'reopen',
  'close'
);

### 4.4 `ledger_projection_offsets`

Checkpoint store for the projection background worker. Ensures idempotent replays across restarts.

| Column | Type | Description |
|--------|------|-------------|
| `worker_id` | `text` | Logical worker identifier (defaults to `default`). |
| `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. |
| `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. |
| `updated_at` | `timestamptz` | Last time the checkpoint was persisted. |

Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID).

## 5. Hashing & verification

1. Canonical serialize the envelope (§2.3).
2. Compute `event_hash` and store along with `previous_hash`.
3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`.
4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`.
5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys.

Verification flow for auditors:

- Fetch event, recompute canonical hash, validate `previous_hash` chain.
- Reconstruct Merkle path from stored leaf hash; verify matches recorded root.
- Cross-check projection `cycle_hash` matches ledger state derived from last event.

## 6. Fixtures & migrations

- Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`.
- Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values).
- Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation).

Fixtures follow canonical key ordering and include precomputed hashes to validate tooling.

## 7. Projection worker

- `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`.
- Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time.
- Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`).
- Each event writes:
  - `findings_projection` (upserted current state with `cycle_hash`).
  - `finding_history` (timeline entry keyed by event ID).
  - `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close).

## 8. Next steps

- Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`).
- Align Vulnerability Explorer queries with the new projection state and timeline endpoints.
- Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised.
|  |  | Array of policy rationale references (explain bundle IDs, remediation notes). |

12 KiB Raw Blame History