Add unit tests for SBOM ingestion and transformation
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly.
- Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps.
- Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges.
- Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges.
- Set up project file for the test project with necessary dependencies and configurations.
- Include JSON fixture files for testing purposes.
This commit is contained in:
master
2025-11-04 07:49:39 +02:00
parent f72c5c513a
commit 2eb6852d34
491 changed files with 39445 additions and 3917 deletions

View File

@@ -0,0 +1,274 @@
# Findings Ledger Schema (Sprint 120)
> **Owners:** Findings Ledger Guild • Vuln Explorer Guild
> **Status:** Draft schema delivered 2025-11-03 for LEDGER-29-001
## 1. Storage profile
| Concern | Decision | Notes |
|---------|----------|-------|
| Engine | PostgreSQL 14+ with UTF-8, `jsonb`, and partitioning support | Aligns with shared data plane; deterministic ordering enforced via primary keys. |
| Tenancy | Range/list partition on `tenant_id` for ledger + projection tables | Simplifies retention and cross-tenant anchoring. |
| Time zone | All timestamps stored as `timestamptz` UTC | Canonical JSON uses ISO-8601 (`yyyy-MM-ddTHH:mm:ss.fffZ`). |
| Hashing | SHA-256 (lower-case hex) over canonical JSON | Implemented client-side and verified by DB constraint. |
| Migrations | SQL files under `src/Findings/StellaOps.Findings.Ledger/migrations` | Applied via DatabaseMigrator (part of platform toolchain). |
## 2. Ledger event model
Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata.
### 2.1 `ledger_events`
| Column | Type | Description |
|--------|------|-------------|
| `tenant_id` | `text` | Tenant partition key. |
| `chain_id` | `uuid` | Logical chain grouping (per tenant/policy combination). |
| `sequence_no` | `bigint` | Monotonic sequence within a chain (gapless). |
| `event_id` | `uuid` | Globally unique event identifier. |
| `event_type` | `ledger_event_type` | Enumerated type (see §2.2). |
| `policy_version` | `text` | Policy digest (e.g., SHA-256). |
| `finding_id` | `text` | Stable finding identity `(artifactId + vulnId + policyVersion)`. |
| `artifact_id` | `text` | Asset identifier (image digest, SBOM id, etc.). |
| `source_run_id` | `uuid` | Policy run that produced the event (nullable). |
| `actor_id` | `text` | Operator/service initiating the mutation. |
| `actor_type` | `text` | `system`, `operator`, `integration`. |
| `occurred_at` | `timestamptz` | Domain timestamp supplied by source. |
| `recorded_at` | `timestamptz` | Ingestion timestamp (defaults to `now()`). |
| `event_body` | `jsonb` | Canonical payload (see §2.3). |
| `event_hash` | `char(64)` | SHA-256 over canonical payload envelope. |
| `previous_hash` | `char(64)` | Hash of prior event in chain (all zeroes for first). |
| `merkle_leaf_hash` | `char(64)` | Leaf hash used for Merkle anchoring (hash over `event_hash || sequence_no`). |
**Constraints & indexes**
```
PRIMARY KEY (tenant_id, chain_id, sequence_no);
UNIQUE (tenant_id, event_id);
UNIQUE (tenant_id, chain_id, event_hash);
CHECK (event_hash ~ '^[0-9a-f]{64}$');
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);
```
Partitions: top-level partitioned by `tenant_id` (list) with a default partition. Optional sub-partition by month on `recorded_at` for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for `event_id` is enforced as `(tenant_id, event_id)` with application-level guards maintaining cross-tenant uniqueness.
### 2.2 Event types
```
CREATE TYPE ledger_event_type AS ENUM (
'finding.created',
'finding.status_changed',
'finding.severity_changed',
'finding.tag_updated',
'finding.comment_added',
'finding.assignment_changed',
'finding.accepted_risk',
'finding.remediation_plan_added',
'finding.attachment_added',
'finding.closed'
);
```
Additional types can be appended via migrations; canonical JSON must include `event_type` key.
### 2.3 Canonical ledger JSON
Canonical payload envelope (before hashing):
```json
{
"event": {
"id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10",
"type": "finding.status_changed",
"tenant": "tenant-a",
"chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc",
"sequence": 42,
"policyVersion": "sha256:5f38...",
"finding": {
"id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product",
"artifactId": "sha256:abc",
"vulnId": "CVE-2025-1234"
},
"actor": {
"id": "user:alice@tenant",
"type": "operator"
},
"occurredAt": "2025-11-03T15:12:05.123Z",
"payload": {
"previousStatus": "affected",
"status": "triaged",
"justification": "Ticket SEC-1234 created",
"ticket": {
"id": "SEC-1234",
"url": "https://tracker/sec-1234"
}
}
}
}
```
Canonicalisation rules:
1. Serialize using UTF-8, no BOM.
2. Sort object keys lexicographically at every level.
3. Represent enums/flags as lower-case strings.
4. Timestamps formatted as `yyyy-MM-ddTHH:mm:ss.fffZ` (millisecond precision, UTC).
5. Numbers use decimal notation; omit trailing zeros.
6. Arrays maintain supplied order.
Hash pipeline:
```
canonical_json = CanonicalJsonSerializer.Serialize(envelope)
sha256_bytes = SHA256(canonical_json)
event_hash = HexLower(sha256_bytes)
```
`merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).`
## 3. Merkle anchoring
Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in `ledger_merkle_roots`.
| Column | Type | Description |
|--------|------|-------------|
| `tenant_id` | `text` | Tenant key. |
| `anchor_id` | `uuid` | Anchor identifier. |
| `window_start` | `timestamptz` | Inclusive start of batch. |
| `window_end` | `timestamptz` | Exclusive end. |
| `sequence_start` | `bigint` | First sequence included. |
| `sequence_end` | `bigint` | Last sequence included. |
| `root_hash` | `char(64)` | Merkle root (SHA-256). |
| `leaf_count` | `integer` | Number of events aggregated. |
| `anchored_at` | `timestamptz` | Timestamp root stored/signed. |
| `anchor_reference` | `text` | Optional reference to external ledger (e.g., Rekor UUID). |
Indexes: `PRIMARY KEY (tenant_id, anchor_id)`, `UNIQUE (tenant_id, root_hash)`, `INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC)`.
## 4. Projection tables
### 4.1 `findings_projection`
Stores the latest verdict/state per finding.
| Column | Type | Description |
|--------|------|-------------|
| `tenant_id` | `text` | Partition key. |
| `finding_id` | `text` | Matches ledger payload. |
| `policy_version` | `text` | Active policy digest. |
| `status` | `text` | e.g., `affected`, `triaged`, `accepted_risk`, `resolved`. |
| `severity` | `numeric(6,3)` | Normalised severity score (0-10). |
| `labels` | `jsonb` | Key-value metadata (tags, KEV flag, runtime signals). |
| `current_event_id` | `uuid` | Ledger event that produced this state. |
| `explain_ref` | `text` | Reference to explain bundle or object storage key. |
| `policy_rationale` | `jsonb` | Array of policy rationale references (explain bundle IDs, remediation notes). |
| `updated_at` | `timestamptz` | Last projection update. |
| `cycle_hash` | `char(64)` | Deterministic hash of projection record (used in export bundles). |
Primary key: `(tenant_id, finding_id, policy_version)`.
Indexes:
- `ix_projection_status` on `(tenant_id, status, severity DESC)`.
- `ix_projection_labels_gin` using `labels` GIN for KEV/runtime filters.
### 4.2 `finding_history`
Delta view derived from ledger events for quick UI queries.
| Column | Type | Description |
|--------|------|-------------|
| `tenant_id` | `text` | Partition key. |
| `finding_id` | `text` | Finding identity. |
| `policy_version` | `text` | Policy digest. |
| `event_id` | `uuid` | Ledger event ID. |
| `status` | `text` | Status after event. |
| `severity` | `numeric(6,3)` | Severity after event (nullable). |
| `actor_id` | `text` | Actor performing change. |
| `comment` | `text` | Optional summary/message. |
| `occurred_at` | `timestamptz` | Domain event timestamp. |
Materialized view or table updated by projector. Indexed by `(tenant_id, finding_id, occurred_at DESC)`.
### 4.3 `triage_actions`
Audit table for operator actions needing tailored queries.
| Column | Type | Description |
|--------|------|-------------|
| `tenant_id` | `text` | Partition key. |
| `action_id` | `uuid` | Primary key. |
| `event_id` | `uuid` | Source ledger event. |
| `finding_id` | `text` | Finding identity. |
| `action_type` | `ledger_action_type` | e.g., `assign`, `comment`, `attach_evidence`, `link_ticket`. |
| `payload` | `jsonb` | Structured action body (canonical stored separately). |
| `created_at` | `timestamptz` | Timestamp stored. |
| `created_by` | `text` | Actor ID. |
`ledger_action_type` enum mirrors CLI/UX operations.
```
CREATE TYPE ledger_action_type AS ENUM (
'assign',
'comment',
'attach_evidence',
'link_ticket',
'remediation_plan',
'status_change',
'accept_risk',
'reopen',
'close'
);
### 4.4 `ledger_projection_offsets`
Checkpoint store for the projection background worker. Ensures idempotent replays across restarts.
| Column | Type | Description |
|--------|------|-------------|
| `worker_id` | `text` | Logical worker identifier (defaults to `default`). |
| `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. |
| `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. |
| `updated_at` | `timestamptz` | Last time the checkpoint was persisted. |
Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID).
## 5. Hashing & verification
1. Canonical serialize the envelope (§2.3).
2. Compute `event_hash` and store along with `previous_hash`.
3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`.
4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`.
5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys.
Verification flow for auditors:
- Fetch event, recompute canonical hash, validate `previous_hash` chain.
- Reconstruct Merkle path from stored leaf hash; verify matches recorded root.
- Cross-check projection `cycle_hash` matches ledger state derived from last event.
## 6. Fixtures & migrations
- Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`.
- Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values).
- Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation).
Fixtures follow canonical key ordering and include precomputed hashes to validate tooling.
## 7. Projection worker
- `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`.
- Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time.
- Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`).
- Each event writes:
- `findings_projection` (upserted current state with `cycle_hash`).
- `finding_history` (timeline entry keyed by event ID).
- `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close).
## 8. Next steps
- Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`).
- Align Vulnerability Explorer queries with the new projection state and timeline endpoints.
- Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised.
| | | Array of policy rationale references (explain bundle IDs, remediation notes). |