Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added IScanMetricsRepository interface for scan metrics persistence and retrieval. - Implemented PostgresScanMetricsRepository for PostgreSQL database interactions, including methods for saving and retrieving scan metrics and execution phases. - Introduced methods for obtaining TTE statistics and recent scans for tenants. - Implemented deletion of old metrics for retention purposes. test(tests): Add SCA Failure Catalogue tests for FC6-FC10 - Created ScaCatalogueDeterminismTests to validate determinism properties of SCA Failure Catalogue fixtures. - Developed ScaFailureCatalogueTests to ensure correct handling of specific failure modes in the scanner. - Included tests for manifest validation, file existence, and expected findings across multiple failure cases. feat(telemetry): Integrate scan completion metrics into the pipeline - Introduced IScanCompletionMetricsIntegration interface and ScanCompletionMetricsIntegration class to record metrics upon scan completion. - Implemented proof coverage and TTE metrics recording with logging for scan completion summaries.
5.6 KiB
5.6 KiB
Scan Metrics Schema
Sprint: SPRINT_3406_0001_0001_metrics_tables
Task: METRICS-3406-013
Working Directory: src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/
Overview
The scan metrics schema provides relational PostgreSQL tables for tracking Time-to-Evidence (TTE) and scan performance metrics. This is a hybrid approach where metrics are stored in PostgreSQL while replay manifests remain in the document store.
Tables
scanner.scan_metrics
Primary table for per-scan metrics.
| Column | Type | Description |
|---|---|---|
metrics_id |
UUID | Primary key |
scan_id |
UUID | Unique scan identifier |
tenant_id |
UUID | Tenant identifier |
surface_id |
UUID | Optional attack surface identifier |
artifact_digest |
TEXT | Artifact content hash |
artifact_type |
TEXT | Type: oci_image, tarball, directory, other |
replay_manifest_hash |
TEXT | Reference to replay manifest in document store |
findings_sha256 |
TEXT | Findings content hash |
vex_bundle_sha256 |
TEXT | VEX bundle content hash |
proof_bundle_sha256 |
TEXT | Proof bundle content hash |
sbom_sha256 |
TEXT | SBOM content hash |
policy_digest |
TEXT | Policy version hash |
feed_snapshot_id |
TEXT | Feed snapshot identifier |
started_at |
TIMESTAMPTZ | Scan start time |
finished_at |
TIMESTAMPTZ | Scan completion time |
total_duration_ms |
INT | TTE in milliseconds (generated) |
t_ingest_ms |
INT | Ingest phase duration |
t_analyze_ms |
INT | Analyze phase duration |
t_reachability_ms |
INT | Reachability phase duration |
t_vex_ms |
INT | VEX phase duration |
t_sign_ms |
INT | Sign phase duration |
t_publish_ms |
INT | Publish phase duration |
package_count |
INT | Number of packages analyzed |
finding_count |
INT | Number of findings |
vex_decision_count |
INT | Number of VEX decisions |
scanner_version |
TEXT | Scanner version |
scanner_image_digest |
TEXT | Scanner container digest |
is_replay |
BOOLEAN | Replay mode flag |
created_at |
TIMESTAMPTZ | Record creation time |
scanner.execution_phases
Detailed phase execution tracking.
| Column | Type | Description |
|---|---|---|
id |
BIGSERIAL | Primary key |
metrics_id |
UUID | Foreign key to scan_metrics |
phase_name |
TEXT | Phase: ingest, analyze, reachability, vex, sign, publish, other |
phase_order |
INT | Execution order |
started_at |
TIMESTAMPTZ | Phase start time |
finished_at |
TIMESTAMPTZ | Phase completion time |
duration_ms |
INT | Duration in milliseconds (generated) |
success |
BOOLEAN | Phase success status |
error_code |
TEXT | Error code if failed |
error_message |
TEXT | Error message if failed |
phase_metrics |
JSONB | Phase-specific metrics |
Views
scanner.scan_tte
Time-to-Evidence view with phase breakdowns.
SELECT
metrics_id,
scan_id,
tte_ms,
tte_seconds,
ingest_percent,
analyze_percent,
reachability_percent,
vex_percent,
sign_percent,
publish_percent
FROM scanner.scan_tte
WHERE tenant_id = :tenant_id;
scanner.tte_stats
Hourly TTE statistics with SLO compliance.
SELECT
hour_bucket,
scan_count,
tte_avg_ms,
tte_p50_ms,
tte_p95_ms,
slo_p50_compliance_percent,
slo_p95_compliance_percent
FROM scanner.tte_stats
WHERE tenant_id = :tenant_id;
Functions
scanner.tte_percentile
Calculate TTE percentile for a tenant.
SELECT scanner.tte_percentile(
p_tenant_id := :tenant_id,
p_percentile := 0.95,
p_since := NOW() - INTERVAL '7 days'
);
Indexes
| Index | Columns | Purpose |
|---|---|---|
idx_scan_metrics_tenant |
tenant_id |
Tenant queries |
idx_scan_metrics_artifact |
artifact_digest |
Artifact lookups |
idx_scan_metrics_started |
started_at |
Time-range queries |
idx_scan_metrics_surface |
surface_id |
Surface queries |
idx_scan_metrics_replay |
is_replay |
Filter replays |
idx_scan_metrics_tenant_started |
tenant_id, started_at |
Compound tenant+time |
idx_execution_phases_metrics |
metrics_id |
Phase lookups |
idx_execution_phases_name |
phase_name |
Phase filtering |
SLO Thresholds
Per the advisory section 13.1:
| Metric | Target |
|---|---|
| TTE P50 | < 120 seconds |
| TTE P95 | < 300 seconds |
Usage Examples
Get TTE for recent scans
SELECT scan_id, tte_ms, tte_seconds
FROM scanner.scan_tte
WHERE tenant_id = :tenant_id
AND NOT is_replay
ORDER BY started_at DESC
LIMIT 100;
Check SLO compliance
SELECT
hour_bucket,
slo_p50_compliance_percent,
slo_p95_compliance_percent
FROM scanner.tte_stats
WHERE tenant_id = :tenant_id
AND hour_bucket >= NOW() - INTERVAL '24 hours';
Phase breakdown analysis
SELECT
phase_name,
AVG(duration_ms) as avg_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_ms
FROM scanner.execution_phases ep
JOIN scanner.scan_metrics sm ON ep.metrics_id = sm.metrics_id
WHERE sm.tenant_id = :tenant_id
AND sm.started_at >= NOW() - INTERVAL '7 days'
GROUP BY phase_name
ORDER BY phase_order;
Migration
Migration file: src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/004_scan_metrics.sql
Apply with:
psql -d stellaops -f 004_scan_metrics.sql