Add unit tests for SBOM ingestion and transformation
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Implement `SbomIngestServiceCollectionExtensionsTests` to verify the SBOM ingestion pipeline exports snapshots correctly. - Create `SbomIngestTransformerTests` to ensure the transformation produces expected nodes and edges, including deduplication of license nodes and normalization of timestamps. - Add `SbomSnapshotExporterTests` to test the export functionality for manifest, adjacency, nodes, and edges. - Introduce `VexOverlayTransformerTests` to validate the transformation of VEX nodes and edges. - Set up project file for the test project with necessary dependencies and configurations. - Include JSON fixture files for testing purposes.
This commit is contained in:
274
docs/modules/findings-ledger/schema.md
Normal file
274
docs/modules/findings-ledger/schema.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# Findings Ledger Schema (Sprint 120)
|
||||
|
||||
> **Owners:** Findings Ledger Guild • Vuln Explorer Guild
|
||||
> **Status:** Draft schema delivered 2025-11-03 for LEDGER-29-001
|
||||
|
||||
## 1. Storage profile
|
||||
|
||||
| Concern | Decision | Notes |
|
||||
|---------|----------|-------|
|
||||
| Engine | PostgreSQL 14+ with UTF-8, `jsonb`, and partitioning support | Aligns with shared data plane; deterministic ordering enforced via primary keys. |
|
||||
| Tenancy | Range/list partition on `tenant_id` for ledger + projection tables | Simplifies retention and cross-tenant anchoring. |
|
||||
| Time zone | All timestamps stored as `timestamptz` UTC | Canonical JSON uses ISO-8601 (`yyyy-MM-ddTHH:mm:ss.fffZ`). |
|
||||
| Hashing | SHA-256 (lower-case hex) over canonical JSON | Implemented client-side and verified by DB constraint. |
|
||||
| Migrations | SQL files under `src/Findings/StellaOps.Findings.Ledger/migrations` | Applied via DatabaseMigrator (part of platform toolchain). |
|
||||
|
||||
## 2. Ledger event model
|
||||
|
||||
Events are immutable append-only records representing every workflow change. Records capture the original event payload, cryptographic hashes, and actor metadata.
|
||||
|
||||
### 2.1 `ledger_events`
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tenant_id` | `text` | Tenant partition key. |
|
||||
| `chain_id` | `uuid` | Logical chain grouping (per tenant/policy combination). |
|
||||
| `sequence_no` | `bigint` | Monotonic sequence within a chain (gapless). |
|
||||
| `event_id` | `uuid` | Globally unique event identifier. |
|
||||
| `event_type` | `ledger_event_type` | Enumerated type (see §2.2). |
|
||||
| `policy_version` | `text` | Policy digest (e.g., SHA-256). |
|
||||
| `finding_id` | `text` | Stable finding identity `(artifactId + vulnId + policyVersion)`. |
|
||||
| `artifact_id` | `text` | Asset identifier (image digest, SBOM id, etc.). |
|
||||
| `source_run_id` | `uuid` | Policy run that produced the event (nullable). |
|
||||
| `actor_id` | `text` | Operator/service initiating the mutation. |
|
||||
| `actor_type` | `text` | `system`, `operator`, `integration`. |
|
||||
| `occurred_at` | `timestamptz` | Domain timestamp supplied by source. |
|
||||
| `recorded_at` | `timestamptz` | Ingestion timestamp (defaults to `now()`). |
|
||||
| `event_body` | `jsonb` | Canonical payload (see §2.3). |
|
||||
| `event_hash` | `char(64)` | SHA-256 over canonical payload envelope. |
|
||||
| `previous_hash` | `char(64)` | Hash of prior event in chain (all zeroes for first). |
|
||||
| `merkle_leaf_hash` | `char(64)` | Leaf hash used for Merkle anchoring (hash over `event_hash || sequence_no`). |
|
||||
|
||||
**Constraints & indexes**
|
||||
|
||||
```
|
||||
PRIMARY KEY (tenant_id, chain_id, sequence_no);
|
||||
UNIQUE (tenant_id, event_id);
|
||||
UNIQUE (tenant_id, chain_id, event_hash);
|
||||
CHECK (event_hash ~ '^[0-9a-f]{64}$');
|
||||
CHECK (previous_hash ~ '^[0-9a-f]{64}$');
|
||||
CREATE INDEX ix_ledger_events_finding ON ledger_events (tenant_id, finding_id, policy_version);
|
||||
CREATE INDEX ix_ledger_events_type ON ledger_events (tenant_id, event_type, recorded_at DESC);
|
||||
```
|
||||
|
||||
Partitions: top-level partitioned by `tenant_id` (list) with a default partition. Optional sub-partition by month on `recorded_at` for large tenants. PostgreSQL requires the partition key in unique constraints; global uniqueness for `event_id` is enforced as `(tenant_id, event_id)` with application-level guards maintaining cross-tenant uniqueness.
|
||||
|
||||
### 2.2 Event types
|
||||
|
||||
```
|
||||
CREATE TYPE ledger_event_type AS ENUM (
|
||||
'finding.created',
|
||||
'finding.status_changed',
|
||||
'finding.severity_changed',
|
||||
'finding.tag_updated',
|
||||
'finding.comment_added',
|
||||
'finding.assignment_changed',
|
||||
'finding.accepted_risk',
|
||||
'finding.remediation_plan_added',
|
||||
'finding.attachment_added',
|
||||
'finding.closed'
|
||||
);
|
||||
```
|
||||
|
||||
Additional types can be appended via migrations; canonical JSON must include `event_type` key.
|
||||
|
||||
### 2.3 Canonical ledger JSON
|
||||
|
||||
Canonical payload envelope (before hashing):
|
||||
|
||||
```json
|
||||
{
|
||||
"event": {
|
||||
"id": "3ac1f4ef-3c26-4b0d-91d4-6a6d3a5bde10",
|
||||
"type": "finding.status_changed",
|
||||
"tenant": "tenant-a",
|
||||
"chainId": "5fa2b970-9da2-4ef4-9a63-463c5d98d3cc",
|
||||
"sequence": 42,
|
||||
"policyVersion": "sha256:5f38...",
|
||||
"finding": {
|
||||
"id": "artifact:sha256:abc|pkg:cpe:/o:vendor:product",
|
||||
"artifactId": "sha256:abc",
|
||||
"vulnId": "CVE-2025-1234"
|
||||
},
|
||||
"actor": {
|
||||
"id": "user:alice@tenant",
|
||||
"type": "operator"
|
||||
},
|
||||
"occurredAt": "2025-11-03T15:12:05.123Z",
|
||||
"payload": {
|
||||
"previousStatus": "affected",
|
||||
"status": "triaged",
|
||||
"justification": "Ticket SEC-1234 created",
|
||||
"ticket": {
|
||||
"id": "SEC-1234",
|
||||
"url": "https://tracker/sec-1234"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Canonicalisation rules:
|
||||
|
||||
1. Serialize using UTF-8, no BOM.
|
||||
2. Sort object keys lexicographically at every level.
|
||||
3. Represent enums/flags as lower-case strings.
|
||||
4. Timestamps formatted as `yyyy-MM-ddTHH:mm:ss.fffZ` (millisecond precision, UTC).
|
||||
5. Numbers use decimal notation; omit trailing zeros.
|
||||
6. Arrays maintain supplied order.
|
||||
|
||||
Hash pipeline:
|
||||
|
||||
```
|
||||
canonical_json = CanonicalJsonSerializer.Serialize(envelope)
|
||||
sha256_bytes = SHA256(canonical_json)
|
||||
event_hash = HexLower(sha256_bytes)
|
||||
```
|
||||
|
||||
`merkle_leaf_hash = HexLower(SHA256(event_hash || '-' || sequence_no)).`
|
||||
|
||||
## 3. Merkle anchoring
|
||||
|
||||
Anchoring batches events per tenant across fixed windows (default: 1,000 events or 15 minutes). Anchors are stored in `ledger_merkle_roots`.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tenant_id` | `text` | Tenant key. |
|
||||
| `anchor_id` | `uuid` | Anchor identifier. |
|
||||
| `window_start` | `timestamptz` | Inclusive start of batch. |
|
||||
| `window_end` | `timestamptz` | Exclusive end. |
|
||||
| `sequence_start` | `bigint` | First sequence included. |
|
||||
| `sequence_end` | `bigint` | Last sequence included. |
|
||||
| `root_hash` | `char(64)` | Merkle root (SHA-256). |
|
||||
| `leaf_count` | `integer` | Number of events aggregated. |
|
||||
| `anchored_at` | `timestamptz` | Timestamp root stored/signed. |
|
||||
| `anchor_reference` | `text` | Optional reference to external ledger (e.g., Rekor UUID). |
|
||||
|
||||
Indexes: `PRIMARY KEY (tenant_id, anchor_id)`, `UNIQUE (tenant_id, root_hash)`, `INDEX ix_merkle_sequences ON ledger_merkle_roots (tenant_id, sequence_end DESC)`.
|
||||
|
||||
## 4. Projection tables
|
||||
|
||||
### 4.1 `findings_projection`
|
||||
|
||||
Stores the latest verdict/state per finding.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tenant_id` | `text` | Partition key. |
|
||||
| `finding_id` | `text` | Matches ledger payload. |
|
||||
| `policy_version` | `text` | Active policy digest. |
|
||||
| `status` | `text` | e.g., `affected`, `triaged`, `accepted_risk`, `resolved`. |
|
||||
| `severity` | `numeric(6,3)` | Normalised severity score (0-10). |
|
||||
| `labels` | `jsonb` | Key-value metadata (tags, KEV flag, runtime signals). |
|
||||
| `current_event_id` | `uuid` | Ledger event that produced this state. |
|
||||
| `explain_ref` | `text` | Reference to explain bundle or object storage key. |
|
||||
| `policy_rationale` | `jsonb` | Array of policy rationale references (explain bundle IDs, remediation notes). |
|
||||
| `updated_at` | `timestamptz` | Last projection update. |
|
||||
| `cycle_hash` | `char(64)` | Deterministic hash of projection record (used in export bundles). |
|
||||
|
||||
Primary key: `(tenant_id, finding_id, policy_version)`.
|
||||
|
||||
Indexes:
|
||||
|
||||
- `ix_projection_status` on `(tenant_id, status, severity DESC)`.
|
||||
- `ix_projection_labels_gin` using `labels` GIN for KEV/runtime filters.
|
||||
|
||||
### 4.2 `finding_history`
|
||||
|
||||
Delta view derived from ledger events for quick UI queries.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tenant_id` | `text` | Partition key. |
|
||||
| `finding_id` | `text` | Finding identity. |
|
||||
| `policy_version` | `text` | Policy digest. |
|
||||
| `event_id` | `uuid` | Ledger event ID. |
|
||||
| `status` | `text` | Status after event. |
|
||||
| `severity` | `numeric(6,3)` | Severity after event (nullable). |
|
||||
| `actor_id` | `text` | Actor performing change. |
|
||||
| `comment` | `text` | Optional summary/message. |
|
||||
| `occurred_at` | `timestamptz` | Domain event timestamp. |
|
||||
|
||||
Materialized view or table updated by projector. Indexed by `(tenant_id, finding_id, occurred_at DESC)`.
|
||||
|
||||
### 4.3 `triage_actions`
|
||||
|
||||
Audit table for operator actions needing tailored queries.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tenant_id` | `text` | Partition key. |
|
||||
| `action_id` | `uuid` | Primary key. |
|
||||
| `event_id` | `uuid` | Source ledger event. |
|
||||
| `finding_id` | `text` | Finding identity. |
|
||||
| `action_type` | `ledger_action_type` | e.g., `assign`, `comment`, `attach_evidence`, `link_ticket`. |
|
||||
| `payload` | `jsonb` | Structured action body (canonical stored separately). |
|
||||
| `created_at` | `timestamptz` | Timestamp stored. |
|
||||
| `created_by` | `text` | Actor ID. |
|
||||
|
||||
`ledger_action_type` enum mirrors CLI/UX operations.
|
||||
|
||||
```
|
||||
CREATE TYPE ledger_action_type AS ENUM (
|
||||
'assign',
|
||||
'comment',
|
||||
'attach_evidence',
|
||||
'link_ticket',
|
||||
'remediation_plan',
|
||||
'status_change',
|
||||
'accept_risk',
|
||||
'reopen',
|
||||
'close'
|
||||
);
|
||||
|
||||
### 4.4 `ledger_projection_offsets`
|
||||
|
||||
Checkpoint store for the projection background worker. Ensures idempotent replays across restarts.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `worker_id` | `text` | Logical worker identifier (defaults to `default`). |
|
||||
| `last_recorded_at` | `timestamptz` | Timestamp of the last projected ledger event. |
|
||||
| `last_event_id` | `uuid` | Event identifier paired with `last_recorded_at` for deterministic ordering. |
|
||||
| `updated_at` | `timestamptz` | Last time the checkpoint was persisted. |
|
||||
|
||||
Seed row inserted on migration ensures catch-up from epoch (`1970-01-01T00:00:00Z` with empty GUID).
|
||||
|
||||
## 5. Hashing & verification
|
||||
|
||||
1. Canonical serialize the envelope (§2.3).
|
||||
2. Compute `event_hash` and store along with `previous_hash`.
|
||||
3. Build Merkle tree per anchoring window using leaf hash `SHA256(event_hash || '-' || sequence_no)`.
|
||||
4. Persist root in `ledger_merkle_roots` and, when configured, submit to external transparency log (Rekor v2). Store receipt/UUID in `anchor_reference`.
|
||||
5. Projection rows compute `cycle_hash = SHA256(canonical_projection_json)` where canonical projection includes fields `{tenant_id, finding_id, policy_version, status, severity, labels, current_event_id}` with sorted keys.
|
||||
|
||||
Verification flow for auditors:
|
||||
|
||||
- Fetch event, recompute canonical hash, validate `previous_hash` chain.
|
||||
- Reconstruct Merkle path from stored leaf hash; verify matches recorded root.
|
||||
- Cross-check projection `cycle_hash` matches ledger state derived from last event.
|
||||
|
||||
## 6. Fixtures & migrations
|
||||
|
||||
- Initial migration script: `src/Findings/StellaOps.Findings.Ledger/migrations/001_initial.sql`.
|
||||
- Sample canonical event: `seed-data/findings-ledger/fixtures/ledger-event.sample.json` (includes pre-computed `eventHash`, `previousHash`, and `merkleLeafHash` values).
|
||||
- Sample projection row: `seed-data/findings-ledger/fixtures/finding-projection.sample.json` (includes canonical `cycleHash` for replay validation).
|
||||
|
||||
Fixtures follow canonical key ordering and include precomputed hashes to validate tooling.
|
||||
|
||||
## 7. Projection worker
|
||||
|
||||
- `LedgerProjectionWorker` consumes ledger events via `PostgresLedgerEventStream`, applying deterministic reductions with `LedgerProjectionReducer`.
|
||||
- Checkpoint state is stored in `ledger_projection_offsets`, allowing replay from any point in time.
|
||||
- Batch processing is configurable via `findings:ledger:projection` (`batchSize`, `idleDelay`).
|
||||
- Each event writes:
|
||||
- `findings_projection` (upserted current state with `cycle_hash`).
|
||||
- `finding_history` (timeline entry keyed by event ID).
|
||||
- `triage_actions` when applicable (status change, comment, assignment, remediation, attachment, accept-risk, close).
|
||||
|
||||
## 8. Next steps
|
||||
|
||||
- Integrate Policy Engine batch evaluation with the projector (`LEDGER-29-004`).
|
||||
- Align Vulnerability Explorer queries with the new projection state and timeline endpoints.
|
||||
- Externalise Merkle anchor publishing to transparency log once anchoring cadence is finalised.
|
||||
| | | Array of policy rationale references (explain bundle IDs, remediation notes). |
|
||||
Reference in New Issue
Block a user