feat(metrics): Implement scan metrics repository and PostgreSQL integration

- Added IScanMetricsRepository interface for scan metrics persistence and retrieval. - Implemented PostgresScanMetricsRepository for PostgreSQL database interactions, including methods for saving and retrieving scan metrics and execution phases. - Introduced methods for obtaining TTE statistics and recent scans for tenants. - Implemented deletion of old metrics for retention purposes. test(tests): Add SCA Failure Catalogue tests for FC6-FC10 - Created ScaCatalogueDeterminismTests to validate determinism properties of SCA Failure Catalogue fixtures. - Developed ScaFailureCatalogueTests to ensure correct handling of specific failure modes in the scanner. - Included tests for manifest validation, file existence, and expected findings across multiple failure cases. feat(telemetry): Integrate scan completion metrics into the pipeline - Introduced IScanCompletionMetricsIntegration interface and ScanCompletionMetricsIntegration class to record metrics upon scan completion. - Implemented proof coverage and TTE metrics recording with logging for scan completion summaries.
2025-12-16 14:00:35 +02:00
parent b55d9fa68d
commit 415eff1207
27 changed files with 3620 additions and 35 deletions
--- a/docs/db/RULES.md
+++ b/docs/db/RULES.md
@@ -334,6 +334,50 @@ cmd.Parameters.AddWithValue("config", json);
 var json = Newtonsoft.Json.JsonConvert.SerializeObject(obj);
 ```

+### 5.3.1 Generated Columns for JSONB Hot Keys
+
+**RULE:** Frequently-queried JSONB fields (>10% of queries) SHOULD be extracted as generated columns.
+
+**When to use generated columns:**
+- Field is used in WHERE clauses frequently
+- Field is used in JOIN conditions
+- Field is used in GROUP BY or ORDER BY
+- Query planner needs cardinality statistics
+
+```sql
+-- ✓ CORRECT: Generated column for hot JSONB field
+ALTER TABLE scheduler.runs
+ADD COLUMN finding_count INT GENERATED ALWAYS AS ((stats->>'findingCount')::int) STORED;
+
+CREATE INDEX idx_runs_finding_count ON scheduler.runs(tenant_id, finding_count);
+```
+
+**RULE:** Generated column names MUST follow snake_case convention matching the JSON path.
+
+```sql
+-- ✓ CORRECT naming
+doc->>'bomFormat'      → bom_format
+stats->>'findingCount' → finding_count
+raw->>'schemaVersion'  → schema_version
+
+-- ✗ INCORRECT naming
+doc->>'bomFormat' → bomFormat, format, bf
+```
+
+**RULE:** Generated columns MUST be added with concurrent index creation in production.
+
+```sql
+-- ✓ CORRECT: Non-blocking migration
+ALTER TABLE scheduler.runs ADD COLUMN finding_count INT GENERATED ALWAYS AS (...) STORED;
+CREATE INDEX CONCURRENTLY idx_runs_finding_count ON scheduler.runs(finding_count);
+ANALYZE scheduler.runs;
+
+-- ✗ INCORRECT: Blocking migration
+CREATE INDEX idx_runs_finding_count ON scheduler.runs(finding_count); -- Blocks table
+```
+
+**Reference:** See `SPECIFICATION.md` Section 6.4 for detailed guidelines.
+
 ### 5.4 Null Handling

 **RULE:** Nullable values MUST use `DBNull.Value` when null.
--- a/docs/db/SPECIFICATION.md
+++ b/docs/db/SPECIFICATION.md
@@ -1173,6 +1173,67 @@ CREATE INDEX idx_metadata_active ON scheduler.runs USING GIN (stats)
    WHERE state = 'completed';
 ```

+### 6.4 Generated Columns for JSONB Hot Keys
+
+For frequently-queried JSONB fields, use PostgreSQL generated columns to enable efficient B-tree indexing and query planning statistics.
+
+**Problem with expression indexes:**
+```sql
+-- Expression indexes don't collect statistics
+CREATE INDEX idx_format ON sbom_docs ((doc->>'bomFormat'));
+-- Query planner can't estimate cardinality, may choose suboptimal plans
+```
+
+**Solution: Generated columns (PostgreSQL 12+):**
+```sql
+-- Add generated column that extracts JSONB field
+ALTER TABLE scanner.sbom_documents
+ADD COLUMN bom_format TEXT GENERATED ALWAYS AS ((doc->>'bomFormat')) STORED;
+
+-- Standard B-tree index with full statistics
+CREATE INDEX idx_sbom_bom_format ON scanner.sbom_documents(bom_format);
+```
+
+**Benefits:**
+- **B-tree indexable**: Standard index on generated column
+- **Statistics**: `ANALYZE` collects cardinality, MCV, histogram
+- **Index-only scans**: Visible to covering indexes
+- **Zero application changes**: Transparent to ORM/queries
+
+**When to use generated columns:**
+- Field queried in >10% of queries against the table
+- Cardinality >100 distinct values (worth collecting stats)
+- Field used in JOIN conditions or GROUP BY
+- Index-only scans are beneficial
+
+**Naming convention:**
+```
+<json_path_snake_case>
+Examples:
+  doc->>'bomFormat'     → bom_format
+  raw->>'schemaVersion' → schema_version
+  stats->>'findingCount'→ finding_count
+```
+
+**Migration pattern:**
+```sql
+-- Step 1: Add generated column (no lock on existing rows)
+ALTER TABLE scheduler.runs
+ADD COLUMN finding_count INT GENERATED ALWAYS AS ((stats->>'findingCount')::int) STORED;
+
+-- Step 2: Create index concurrently
+CREATE INDEX CONCURRENTLY idx_runs_finding_count
+ON scheduler.runs(tenant_id, finding_count);
+
+-- Step 3: Analyze for statistics
+ANALYZE scheduler.runs;
+```
+
+**Reference implementations:**
+- `src/Scheduler/...Storage.Postgres/Migrations/010_generated_columns_runs.sql`
+- `src/Excititor/...Storage.Postgres/Migrations/004_generated_columns_vex.sql`
+- `src/Concelier/...Storage.Postgres/Migrations/007_generated_columns_advisories.sql`
+
 ---

 ## 7. Partitioning Strategy
--- a/docs/db/schemas/scan-metrics.md
+++ b/docs/db/schemas/scan-metrics.md
@@ -0,0 +1,195 @@
+# Scan Metrics Schema
+
+Sprint: `SPRINT_3406_0001_0001_metrics_tables`  
+Task: `METRICS-3406-013`  
+Working Directory: `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/`
+
+## Overview
+
+The scan metrics schema provides relational PostgreSQL tables for tracking Time-to-Evidence (TTE) and scan performance metrics. This is a hybrid approach where metrics are stored in PostgreSQL while replay manifests remain in the document store.
+
+## Tables
+
+### `scanner.scan_metrics`
+
+Primary table for per-scan metrics.
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `metrics_id` | UUID | Primary key |
+| `scan_id` | UUID | Unique scan identifier |
+| `tenant_id` | UUID | Tenant identifier |
+| `surface_id` | UUID | Optional attack surface identifier |
+| `artifact_digest` | TEXT | Artifact content hash |
+| `artifact_type` | TEXT | Type: `oci_image`, `tarball`, `directory`, `other` |
+| `replay_manifest_hash` | TEXT | Reference to replay manifest in document store |
+| `findings_sha256` | TEXT | Findings content hash |
+| `vex_bundle_sha256` | TEXT | VEX bundle content hash |
+| `proof_bundle_sha256` | TEXT | Proof bundle content hash |
+| `sbom_sha256` | TEXT | SBOM content hash |
+| `policy_digest` | TEXT | Policy version hash |
+| `feed_snapshot_id` | TEXT | Feed snapshot identifier |
+| `started_at` | TIMESTAMPTZ | Scan start time |
+| `finished_at` | TIMESTAMPTZ | Scan completion time |
+| `total_duration_ms` | INT | TTE in milliseconds (generated) |
+| `t_ingest_ms` | INT | Ingest phase duration |
+| `t_analyze_ms` | INT | Analyze phase duration |
+| `t_reachability_ms` | INT | Reachability phase duration |
+| `t_vex_ms` | INT | VEX phase duration |
+| `t_sign_ms` | INT | Sign phase duration |
+| `t_publish_ms` | INT | Publish phase duration |
+| `package_count` | INT | Number of packages analyzed |
+| `finding_count` | INT | Number of findings |
+| `vex_decision_count` | INT | Number of VEX decisions |
+| `scanner_version` | TEXT | Scanner version |
+| `scanner_image_digest` | TEXT | Scanner container digest |
+| `is_replay` | BOOLEAN | Replay mode flag |
+| `created_at` | TIMESTAMPTZ | Record creation time |
+
+### `scanner.execution_phases`
+
+Detailed phase execution tracking.
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | BIGSERIAL | Primary key |
+| `metrics_id` | UUID | Foreign key to `scan_metrics` |
+| `phase_name` | TEXT | Phase: `ingest`, `analyze`, `reachability`, `vex`, `sign`, `publish`, `other` |
+| `phase_order` | INT | Execution order |
+| `started_at` | TIMESTAMPTZ | Phase start time |
+| `finished_at` | TIMESTAMPTZ | Phase completion time |
+| `duration_ms` | INT | Duration in milliseconds (generated) |
+| `success` | BOOLEAN | Phase success status |
+| `error_code` | TEXT | Error code if failed |
+| `error_message` | TEXT | Error message if failed |
+| `phase_metrics` | JSONB | Phase-specific metrics |
+
+## Views
+
+### `scanner.scan_tte`
+
+Time-to-Evidence view with phase breakdowns.
+
+```sql
+SELECT
+    metrics_id,
+    scan_id,
+    tte_ms,
+    tte_seconds,
+    ingest_percent,
+    analyze_percent,
+    reachability_percent,
+    vex_percent,
+    sign_percent,
+    publish_percent
+FROM scanner.scan_tte
+WHERE tenant_id = :tenant_id;
+```
+
+### `scanner.tte_stats`
+
+Hourly TTE statistics with SLO compliance.
+
+```sql
+SELECT
+    hour_bucket,
+    scan_count,
+    tte_avg_ms,
+    tte_p50_ms,
+    tte_p95_ms,
+    slo_p50_compliance_percent,
+    slo_p95_compliance_percent
+FROM scanner.tte_stats
+WHERE tenant_id = :tenant_id;
+```
+
+## Functions
+
+### `scanner.tte_percentile`
+
+Calculate TTE percentile for a tenant.
+
+```sql
+SELECT scanner.tte_percentile(
+    p_tenant_id := :tenant_id,
+    p_percentile := 0.95,
+    p_since := NOW() - INTERVAL '7 days'
+);
+```
+
+## Indexes
+
+| Index | Columns | Purpose |
+|-------|---------|---------|
+| `idx_scan_metrics_tenant` | `tenant_id` | Tenant queries |
+| `idx_scan_metrics_artifact` | `artifact_digest` | Artifact lookups |
+| `idx_scan_metrics_started` | `started_at` | Time-range queries |
+| `idx_scan_metrics_surface` | `surface_id` | Surface queries |
+| `idx_scan_metrics_replay` | `is_replay` | Filter replays |
+| `idx_scan_metrics_tenant_started` | `tenant_id, started_at` | Compound tenant+time |
+| `idx_execution_phases_metrics` | `metrics_id` | Phase lookups |
+| `idx_execution_phases_name` | `phase_name` | Phase filtering |
+
+## SLO Thresholds
+
+Per the advisory section 13.1:
+
+| Metric | Target |
+|--------|--------|
+| TTE P50 | < 120 seconds |
+| TTE P95 | < 300 seconds |
+
+## Usage Examples
+
+### Get TTE for recent scans
+
+```sql
+SELECT scan_id, tte_ms, tte_seconds
+FROM scanner.scan_tte
+WHERE tenant_id = :tenant_id
+  AND NOT is_replay
+ORDER BY started_at DESC
+LIMIT 100;
+```
+
+### Check SLO compliance
+
+```sql
+SELECT
+    hour_bucket,
+    slo_p50_compliance_percent,
+    slo_p95_compliance_percent
+FROM scanner.tte_stats
+WHERE tenant_id = :tenant_id
+  AND hour_bucket >= NOW() - INTERVAL '24 hours';
+```
+
+### Phase breakdown analysis
+
+```sql
+SELECT
+    phase_name,
+    AVG(duration_ms) as avg_ms,
+    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_ms
+FROM scanner.execution_phases ep
+JOIN scanner.scan_metrics sm ON ep.metrics_id = sm.metrics_id
+WHERE sm.tenant_id = :tenant_id
+  AND sm.started_at >= NOW() - INTERVAL '7 days'
+GROUP BY phase_name
+ORDER BY phase_order;
+```
+
+## Migration
+
+Migration file: `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/004_scan_metrics.sql`
+
+Apply with:
+```bash
+psql -d stellaops -f 004_scan_metrics.sql
+```
+
+## Related
+
+- [Database Specification](./SPECIFICATION.md)
+- [Determinism Advisory §13.1](../product-advisories/14-Dec-2025%20-%20Determinism%20and%20Reproducibility%20Technical%20Reference.md)
+- [Scheduler Schema](./schemas/scheduler.sql)