documentation cleanse, sprints work and planning. remaining non EF DAL migration to EF

This commit is contained in:
master
2026-02-25 01:24:07 +02:00
parent b07d27772e
commit 4db038123b
9090 changed files with 4836 additions and 2909 deletions

View File

@@ -0,0 +1,225 @@
# Analytics Module
The Analytics module provides a star-schema data warehouse layer for SBOM and attestation data, enabling executive reporting, risk dashboards, and ad-hoc analysis.
## Overview
Stella Ops generates rich data through SBOM ingestion, vulnerability correlation, VEX assessments, and attestations. The Analytics module normalizes this data into a queryable warehouse schema optimized for:
- **Executive dashboards**: Risk posture, vulnerability trends, compliance status
- **Supply chain analysis**: Supplier concentration, license distribution
- **Security metrics**: CVE exposure, VEX effectiveness, MTTR tracking
- **Attestation coverage**: SLSA compliance, provenance gaps
## Key Capabilities
| Capability | Description |
|------------|-------------|
| Unified component registry | Canonical component table with normalized suppliers and licenses |
| Vulnerability correlation | Pre-joined component-vulnerability mapping with EPSS/KEV flags |
| VEX-adjusted exposure | Vulnerability counts that respect active VEX overrides (validity windows applied) |
| Attestation tracking | Provenance and SLSA level coverage by environment/team |
| Time-series rollups | Daily snapshots for trend analysis |
| Materialized views | Pre-computed aggregations for dashboard performance |
## Data Model
### Star Schema Overview
```
┌─────────────────┐
│ artifacts │ (dimension)
│ container/app │
└────────┬────────┘
┌──────────────┼──────────────┐
│ │ │
┌─────────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ artifact_ │ │attestations│ │vex_overrides│
│ components │ │ (fact) │ │ (fact) │
│ (bridge) │ └───────────┘ └─────────────┘
└─────────┬──────┘
┌─────────▼──────┐
│ components │ (dimension)
│ unified │
│ registry │
└─────────┬──────┘
┌─────────▼──────┐
│ component_ │
│ vulns │ (fact)
│ (bridge) │
└────────────────┘
```
### Core Tables
| Table | Type | Purpose |
|-------|------|---------|
| `components` | Dimension | Unified component registry with PURL, supplier, license |
| `artifacts` | Dimension | Container images and applications with SBOM metadata |
| `artifact_components` | Bridge | Links artifacts to their SBOM components |
| `component_vulns` | Fact | Component-to-vulnerability mapping |
| `attestations` | Fact | Attestation metadata (provenance, SBOM, VEX) |
| `vex_overrides` | Fact | VEX status overrides with justifications |
| `raw_sboms` | Audit | Raw SBOM payloads for reprocessing |
| `raw_attestations` | Audit | Raw DSSE envelopes for audit |
| `daily_vulnerability_counts` | Rollup | Daily vuln aggregations |
| `daily_component_counts` | Rollup | Daily component aggregations |
Rollup retention is 90 days in hot storage. `compute_daily_rollups()` prunes
older rows after each run; archival follows operations runbooks.
Platform WebService can automate rollups + materialized view refreshes via
`PlatformAnalyticsMaintenanceService` (see `architecture.md` for schedule and
configuration).
Use `Platform:AnalyticsMaintenance:BackfillDays` to recompute the most recent
N days of rollups on the first maintenance run after downtime (set to `0` to disable).
### Materialized Views
| View | Refresh | Purpose |
|------|---------|---------|
| `mv_supplier_concentration` | Daily | Top suppliers by component count |
| `mv_license_distribution` | Daily | License category distribution |
| `mv_vuln_exposure` | Daily | CVE exposure adjusted by VEX |
| `mv_attestation_coverage` | Daily | Provenance/SLSA coverage by env/team |
Array-valued fields (for example `environments` and `ecosystems`) are ordered
alphabetically to keep analytics outputs deterministic.
## Quick Start
### Day-1 Queries
**Top supplier concentration (supply chain risk, optional environment filter):**
```sql
SELECT analytics.sp_top_suppliers(20, 'prod');
```
**License risk heatmap (optional environment filter):**
```sql
SELECT analytics.sp_license_heatmap('prod');
```
**CVE exposure adjusted by VEX:**
```sql
SELECT analytics.sp_vuln_exposure('prod', 'high');
```
**Fixable vulnerability backlog:**
```sql
SELECT analytics.sp_fixable_backlog('prod');
```
**Attestation coverage gaps:**
```sql
SELECT analytics.sp_attestation_gaps('prod');
```
### API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/analytics/suppliers` | GET | Supplier concentration data |
| `/api/analytics/licenses` | GET | License distribution |
| `/api/analytics/vulnerabilities` | GET | CVE exposure (VEX-adjusted) |
| `/api/analytics/backlog` | GET | Fixable vulnerability backlog |
| `/api/analytics/attestation-coverage` | GET | Attestation gaps |
| `/api/analytics/trends/vulnerabilities` | GET | Vulnerability time-series |
| `/api/analytics/trends/components` | GET | Component time-series |
All analytics endpoints require the `analytics.read` scope.
The platform metadata capability `analytics` reports whether analytics storage is configured.
#### Query Parameters
- `/api/analytics/suppliers`: `limit` (optional, default 20), `environment` (optional)
- `/api/analytics/licenses`: `environment` (optional)
- `/api/analytics/vulnerabilities`: `minSeverity` (optional, default `low`), `environment` (optional)
- `/api/analytics/backlog`: `environment` (optional)
- `/api/analytics/attestation-coverage`: `environment` (optional)
- `/api/analytics/trends/vulnerabilities`: `environment` (optional), `days` (optional, default 30)
- `/api/analytics/trends/components`: `environment` (optional), `days` (optional, default 30)
## Ingestion Configuration
Analytics ingestion runs inside the Platform WebService and subscribes to Scanner, Concelier, and Attestor streams. Configure ingestion via `Platform:AnalyticsIngestion`:
```yaml
Platform:
Storage:
PostgresConnectionString: "Host=...;Database=analytics;Username=...;Password=..."
AnalyticsIngestion:
Enabled: true
PostgresConnectionString: "" # optional; defaults to Platform:Storage
AllowedTenants: ["tenant-a", "tenant-b"]
Streams:
ScannerStream: "orchestrator:events"
ConcelierObservationStream: "concelier:advisory.observation.updated:v1"
ConcelierLinksetStream: "concelier:advisory.linkset.updated:v1"
AttestorStream: "attestor:events"
StartFromBeginning: false
Cas:
RootPath: "/var/lib/stellaops/cas"
DefaultBucket: "attestations"
Attestations:
BundleUriTemplate: "bundle:{digest}"
```
Bundle URI templates support:
- `{digest}` for the full digest string (for example `sha256:...`).
- `{hash}` for the raw hex digest (no algorithm prefix).
- `bundle:{digest}` which resolves to `cas://<DefaultBucket>/{digest}` by default.
- `file:/path/to/bundles/bundle-{hash}.json` for offline file ingestion.
For offline workflows, verify bundles with `stella bundle verify` before ingesting them.
## Console UI
SBOM Lake analytics are exposed in the Console under `Analytics > SBOM Lake` (`/analytics/sbom-lake`).
Console access requires `ui.read` plus `analytics.read` scopes.
Key UI features:
- Filters for environment, minimum severity, and time window.
- Panels for suppliers, licenses, vulnerability exposure, and attestation coverage.
- Trend views for vulnerabilities and components.
- Fixable backlog table with CSV export.
See [console.md](./console.md) for operator guidance and filter behavior.
## CLI Access
SBOM lake analytics are exposed via the CLI under `stella analytics sbom-lake`
(requires `analytics.read` scope).
```bash
# Top suppliers
stella analytics sbom-lake suppliers --limit 20
# Vulnerability exposure in prod (high+), CSV export
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high --format csv --output vuln.csv
# 30-day trends for both series
stella analytics sbom-lake trends --days 30 --series all --format json
```
See `docs/modules/cli/guides/commands/analytics.md` for command-level details.
## Architecture
See [architecture.md](./architecture.md) for detailed design decisions, data flow, and normalization rules.
## Schema Reference
See [analytics_schema.sql](../../db/analytics_schema.sql) for complete DDL including:
- Table definitions with indexes
- Normalization functions
- Materialized views
- Stored procedures
- Refresh procedures
## Sprint Reference
Implementation tracked in:
- `docs/implplan/SPRINT_20260120_030_Platform_sbom_analytics_lake.md`
- `docs/implplan/SPRINT_20260120_032_Cli_sbom_analytics_cli.md`

View File

@@ -0,0 +1,298 @@
# Analytics Module Architecture
> **Implementation Note:** Analytics is a cross-cutting feature integrated into the **Platform WebService** (`src/Platform/`). There is no standalone `src/Analytics/` module. Data ingestion pipelines span Scanner, Concelier, and Attestor modules. See [Platform Architecture](../platform/architecture-overview.md) for service-level integration details.
## Design Philosophy
The Analytics module implements a **star-schema data warehouse** pattern optimized for analytical queries rather than transactional workloads. Key design principles:
1. **Separation of concerns**: Analytics schema is isolated from operational schemas (scanner, vex, proof_system)
2. **Pre-computation**: Expensive aggregations computed in advance via materialized views
3. **Audit trail**: Raw payloads preserved for reprocessing and compliance
4. **Determinism**: Normalization functions are immutable and reproducible; array aggregates are ordered for stable outputs
5. **Incremental updates**: Supports both full refresh and incremental ingestion
## Data Flow
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Scanner │ │ Concelier │ │ Attestor │
│ (SBOM) │ │ (Vuln) │ │ (DSSE) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ SBOM Ingested │ Vuln Updated │ Attestation Created
▼ ▼ ▼
┌──────────────────────────────────────────────────────┐
│ AnalyticsIngestionService │
│ - Normalize components (PURL, supplier, license) │
│ - Upsert to unified registry │
│ - Correlate with vulnerabilities │
│ - Store raw payloads │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ analytics schema │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────────┐ │
│ │components│ │artifacts│ │comp_vuln│ │attestations│ │
│ └─────────┘ └─────────┘ └─────────┘ └────────────┘ │
└──────────────────────────────────────────────────────┘
│ Daily refresh
┌──────────────────────────────────────────────────────┐
│ Materialized Views │
│ mv_supplier_concentration | mv_license_distribution │
│ mv_vuln_exposure | mv_attestation_coverage │
└──────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────┐
│ Platform API Endpoints │
│ (with 5-minute caching) │
└──────────────────────────────────────────────────────┘
```
## Normalization Rules
### PURL Parsing
Package URLs (PURLs) are the canonical identifier for components. The `parse_purl()` function extracts:
| Field | Example | Notes |
|-------|---------|-------|
| `purl_type` | `maven`, `npm`, `pypi` | Ecosystem identifier |
| `purl_namespace` | `org.apache.logging` | Group/org/scope (optional) |
| `purl_name` | `log4j-core` | Package name |
| `purl_version` | `2.17.1` | Version string |
### Supplier Normalization
The `normalize_supplier()` function standardizes supplier names for consistent grouping:
1. Convert to lowercase
2. Trim whitespace
3. Remove legal suffixes: Inc., LLC, Ltd., Corp., GmbH, B.V., S.A., PLC, Co.
4. Normalize internal whitespace
**Examples:**
- `"Apache Software Foundation, Inc."``"apache software foundation"`
- `"Google LLC"``"google"`
- `" Microsoft Corp. "``"microsoft"`
### License Categorization
The `categorize_license()` function maps SPDX expressions to risk categories:
| Category | Examples | Risk Level |
|----------|----------|------------|
| `permissive` | MIT, Apache-2.0, BSD-3-Clause, ISC | Low |
| `copyleft-weak` | LGPL-2.1, MPL-2.0, EPL-2.0 | Medium |
| `copyleft-strong` | GPL-3.0, AGPL-3.0, SSPL | High |
| `proprietary` | Proprietary, Commercial | Review Required |
| `unknown` | Unrecognized expressions | Review Required |
**Special handling:**
- GPL with exceptions (e.g., `GPL-2.0 WITH Classpath-exception-2.0`) → `copyleft-weak`
- Dual-licensed (e.g., `MIT OR Apache-2.0`) → uses first match
## Component Deduplication
Components are deduplicated by `(purl, hash_sha256)`:
1. If same PURL and hash: existing record updated (last_seen_at, counts)
2. If same PURL but different hash: new record created (version change)
3. If same hash but different PURL: new record (aliased package)
**Upsert pattern:**
```sql
INSERT INTO analytics.components (...)
VALUES (...)
ON CONFLICT (purl, hash_sha256) DO UPDATE SET
last_seen_at = now(),
sbom_count = components.sbom_count + 1,
updated_at = now();
```
## Vulnerability Correlation
When a component is upserted, the `VulnerabilityCorrelationService` queries Concelier for matching advisories:
1. Query by PURL type + namespace + name
2. Filter by version range matching
3. Upsert to `component_vulns` with severity, EPSS, KEV flags
**Version range matching** currently supports semver ranges and exact matches via
`VersionRuleEvaluator`. Non-semver schemes fall back to exact string matches; wildcard
and ecosystem-specific ranges require upstream normalization.
## VEX Override Logic
The `mv_vuln_exposure` view implements VEX-adjusted counts:
```sql
-- Effective count excludes artifacts with active VEX overrides
COUNT(DISTINCT ac.artifact_id) FILTER (
WHERE NOT EXISTS (
SELECT 1 FROM analytics.vex_overrides vo
WHERE vo.artifact_id = ac.artifact_id
AND vo.vuln_id = cv.vuln_id
AND vo.status = 'not_affected'
AND (vo.valid_until IS NULL OR vo.valid_until > now())
)
) AS effective_artifact_count
```
**Override validity:**
- `valid_from`: When the override became effective
- `valid_until`: Expiration (NULL = no expiration)
- Only `status = 'not_affected'` reduces exposure counts, and only when the override is active in its validity window.
## Attestation Ingestion
Attestation ingestion consumes Attestor Rekor entry events and expects Sigstore bundles
or raw DSSE envelopes. The ingestion service:
- Resolves bundle URIs using `BundleUriTemplate`; `bundle:{digest}` maps to
`cas://<DefaultBucket>/{digest}` by default.
- Decodes DSSE payloads, computes `dsse_payload_hash`, and records `predicate_uri` plus
Rekor log metadata (`rekor_log_id`, `rekor_log_index`).
- Uses in-toto `subject` digests to link artifacts when reanalysis hints are absent.
- Maps predicate URIs into `analytics_attestation_type` values
(`provenance`, `sbom`, `vex`, `build`, `scan`, `policy`).
- Expands VEX statements into `vex_overrides` rows, one per product reference, and
captures optional validity timestamps when provided.
## Time-Series Rollups
Daily rollups computed by `compute_daily_rollups()`:
**Vulnerability counts** (per environment/team/severity):
- `total_vulns`: All affecting vulnerabilities
- `fixable_vulns`: Vulns with `fix_available = TRUE`
- `vex_mitigated`: Vulns with active `not_affected` override
- `kev_vulns`: Vulns in CISA KEV
- `unique_cves`: Distinct CVE IDs
- `affected_artifacts`: Artifacts containing affected components
- `affected_components`: Components with affecting vulns
**Component counts** (per environment/team/license/type):
- `total_components`: Distinct components
- `unique_suppliers`: Distinct normalized suppliers
**Retention policy:** 90 days in hot storage; `compute_daily_rollups()` prunes older rows and downstream jobs archive to cold storage.
## Materialized View Refresh
All materialized views support `REFRESH ... CONCURRENTLY` for zero-downtime updates:
```sql
-- Refresh all views (non-concurrent; run off-peak)
SELECT analytics.refresh_all_views();
```
**Refresh schedule (recommended):**
- `mv_supplier_concentration`: 02:00 UTC daily
- `mv_license_distribution`: 02:15 UTC daily
- `mv_vuln_exposure`: 02:30 UTC daily
- `mv_attestation_coverage`: 02:45 UTC daily
- `compute_daily_rollups()`: 03:00 UTC daily
Platform WebService can run the daily rollup + refresh loop via
`PlatformAnalyticsMaintenanceService`. Configure the schedule with:
- `Platform:AnalyticsMaintenance:Enabled` (default `true`)
- `Platform:AnalyticsMaintenance:IntervalMinutes` (default `1440`)
- `Platform:AnalyticsMaintenance:RunOnStartup` (default `true`)
- `Platform:AnalyticsMaintenance:ComputeDailyRollups` (default `true`)
- `Platform:AnalyticsMaintenance:RefreshMaterializedViews` (default `true`)
- `Platform:AnalyticsMaintenance:BackfillDays` (default `0`, set to `0` to disable; recompute the most recent N days on the first maintenance run)
The hosted service issues concurrent refresh statements directly for each view.
Use a DB scheduler (pg_cron) or external orchestrator if you need the staggered
per-view timing above.
## Performance Considerations
### Indexing Strategy
| Table | Key Indexes | Query Pattern |
|-------|-------------|---------------|
| `components` | `purl`, `supplier_normalized`, `license_category` | Lookup, aggregation |
| `artifacts` | `digest`, `environment`, `team` | Lookup, filtering |
| `component_vulns` | `vuln_id`, `severity`, `fix_available` | Join, filtering |
| `attestations` | `artifact_id`, `predicate_type` | Join, aggregation |
| `vex_overrides` | `(artifact_id, vuln_id)`, `status` | Subquery exists |
### Query Performance Targets
| Query | Target | Notes |
|-------|--------|-------|
| `sp_top_suppliers(20, 'prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
| `sp_license_heatmap('prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
| `sp_vuln_exposure()` | < 200ms | Uses materialized view for global queries; environment filters read base tables |
| `sp_fixable_backlog()` | < 500ms | Live query with indexes |
| `sp_attestation_gaps()` | < 100ms | Uses materialized view |
### Caching Strategy
Platform API endpoints use a 5-minute TTL cache:
- Cache key: endpoint + query parameters
- Invalidation: Time-based only (no event-driven invalidation)
- Storage: Valkey (in-memory)
## Security Considerations
### Schema Permissions
```sql
-- Read-only role for dashboards
GRANT USAGE ON SCHEMA analytics TO dashboard_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO dashboard_reader;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA analytics TO dashboard_reader;
-- Write role for ingestion service
GRANT USAGE ON SCHEMA analytics TO analytics_writer;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA analytics TO analytics_writer;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA analytics TO analytics_writer;
```
### Data Classification
| Table | Classification | Notes |
|-------|----------------|-------|
| `components` | Internal | Contains package names, versions |
| `artifacts` | Internal | Contains image names, team names |
| `component_vulns` | Internal | Vulnerability data (public CVEs) |
| `vex_overrides` | Confidential | Contains justifications, operator IDs |
| `raw_sboms` | Confidential | Full SBOM payloads |
| `raw_attestations` | Confidential | Signed attestation envelopes |
### Audit Trail
All tables include `created_at` and `updated_at` timestamps. Raw payload tables (`raw_sboms`, `raw_attestations`) are append-only with content hashes for integrity verification.
## Integration Points
### Upstream Dependencies
| Service | Event | Contract | Action |
|---------|-------|----------|--------|
| Scanner | SBOM report ready | `scanner.event.report.ready@1` (`docs/modules/signals/events/orchestrator-scanner-events.md`) | Normalize and upsert components |
| Concelier | Advisory observation/linkset updated | `advisory.observation.updated@1` (`docs/modules/concelier/events/advisory.observation.updated@1.schema.json`), `advisory.linkset.updated@1` (`docs/modules/concelier/events/advisory.linkset.updated@1.md`) | Re-correlate affected components |
| Excititor | VEX statement changes | `vex.statement.*` (`docs/modules/excititor/architecture.md`) | Create/update vex_overrides |
| Attestor | Rekor entry logged | `rekor.entry.logged` (`docs/modules/attestor/architecture.md`) | Upsert attestation record |
### Downstream Consumers
| Consumer | Data | Endpoint |
|----------|------|----------|
| Console UI | Dashboard data | `/api/analytics/*` |
| Export Center | Compliance reports | Direct DB query |
| AdvisoryAI | Risk context | `/api/analytics/vulnerabilities` |
## Future Enhancements
1. **Partitioning**: Partition `daily_*` tables by date for faster queries and archival
2. **Incremental refresh**: Implement incremental materialized view refresh for large datasets
3. **Custom dimensions**: Support user-defined component groupings (business units, cost centers)
4. **Predictive analytics**: Add ML-based risk prediction using historical trends
5. **BI tool integration**: Direct connectors for Tableau, Looker, Metabase

View File

@@ -0,0 +1,64 @@
# Analytics Console (SBOM Lake)
The Console exposes SBOM analytics lake data under `Analytics > SBOM Lake`.
This view is read-only and uses the analytics API endpoints documented in `docs/modules/analytics/README.md`.
## Access
- Route: `/analytics/sbom-lake`
- Required scopes: `ui.read` and `analytics.read`
- Console admin bundles: `role/analytics-viewer`, `role/analytics-operator`, `role/analytics-admin`
- Data freshness: the page surfaces the latest `dataAsOf` timestamp returned by the API.
## Filters
The SBOM Lake page supports three filters that round-trip via URL query parameters:
- Environment: `env` (optional, example: `Prod`)
- Minimum severity: `severity` (optional, example: `high`)
- Time window (days): `days` (optional, example: `90`)
When a filter changes, the Console reloads all panels using the updated parameters.
Supplier and license panels honor the environment filter alongside the other views.
## Panels
The dashboard presents four summary panels:
1. Supplier concentration (top suppliers by component count)
2. License distribution (license categories and counts)
3. Vulnerability exposure (top CVEs after VEX adjustments)
4. Attestation coverage (provenance and SLSA 2+ coverage)
Each panel shows a loading state, empty state, and summary counts.
## Trends
Two trend panels are included:
- Vulnerability trend: net exposure over the selected time window
- Component trend: total components and unique suppliers
The Console aggregates trend points by date and renders a simple bar chart plus a compact list.
## Fixable Backlog
The fixable backlog table lists vulnerabilities with fixes available, grouped by component and service.
The "Top backlog components" table derives a component summary from the same backlog data.
### CSV Export
The "Export backlog CSV" action downloads a deterministic, ordered CSV with:
- Service
- Component
- Version
- Vulnerability
- Severity
- Environment
- Fixed version
## Troubleshooting
- If panels show "No data", verify that the analytics schema and materialized views are populated.
- If an error banner appears, check the analytics API availability and ensure the tenant has `analytics.read`.

View File

@@ -0,0 +1,422 @@
# Analytics Query Library
This document provides ready-to-use SQL queries for common analytics use cases. All queries are optimized for the analytics star schema.
## Executive Dashboard Queries
### 1. Top Supplier Concentration (Supply Chain Risk)
Identifies suppliers with the highest component footprint, indicating supply chain concentration risk.
```sql
-- Via stored procedure (recommended, optional environment filter)
SELECT analytics.sp_top_suppliers(20, 'prod');
-- Direct query
SELECT
supplier,
component_count,
artifact_count,
team_count,
critical_vuln_count,
high_vuln_count,
environments
FROM analytics.mv_supplier_concentration
ORDER BY component_count DESC
LIMIT 20;
```
**Use case**: Identify vendors that, if compromised, would affect the most artifacts.
### 2. License Risk Heatmap
Shows distribution of components by license category for compliance review.
```sql
-- Via stored procedure (optional environment filter)
SELECT analytics.sp_license_heatmap('prod');
-- Direct query with grouping
SELECT
license_category,
SUM(component_count) AS total_components,
SUM(artifact_count) AS total_artifacts,
COUNT(DISTINCT license_concluded) AS unique_licenses
FROM analytics.mv_license_distribution
GROUP BY license_category
ORDER BY
CASE license_category
WHEN 'copyleft-strong' THEN 1
WHEN 'proprietary' THEN 2
WHEN 'unknown' THEN 3
WHEN 'copyleft-weak' THEN 4
ELSE 5
END;
```
**Use case**: Compliance review, identify components requiring legal review.
### 3. CVE Exposure Adjusted by VEX
Shows true vulnerability exposure after applying VEX mitigations.
```sql
-- Via stored procedure
SELECT analytics.sp_vuln_exposure('prod', 'high');
-- Direct query showing VEX effectiveness (global view; use sp_vuln_exposure for environment filtering)
SELECT
vuln_id,
severity::TEXT,
cvss_score,
epss_score,
kev_listed,
fix_available,
raw_artifact_count AS total_affected,
effective_artifact_count AS actually_affected,
raw_artifact_count - effective_artifact_count AS vex_mitigated,
ROUND(100.0 * (raw_artifact_count - effective_artifact_count) / NULLIF(raw_artifact_count, 0), 1) AS mitigation_rate
FROM analytics.mv_vuln_exposure
WHERE effective_artifact_count > 0
ORDER BY
CASE severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'medium' THEN 3
ELSE 4
END,
effective_artifact_count DESC
LIMIT 50;
```
**Use case**: Show executives the "real" risk after VEX assessment.
### 4. Fixable Vulnerability Backlog
Lists vulnerabilities that can be fixed today (fix available, not VEX-mitigated).
```sql
-- Via stored procedure
SELECT analytics.sp_fixable_backlog('prod');
-- Direct query with priority scoring
SELECT
a.name AS service,
a.environment,
a.team,
c.name AS component,
c.version AS current_version,
cv.vuln_id,
cv.severity::TEXT,
cv.cvss_score,
cv.epss_score,
cv.fixed_version,
cv.kev_listed,
-- Priority score: higher = fix first
(
CASE cv.severity
WHEN 'critical' THEN 100
WHEN 'high' THEN 75
WHEN 'medium' THEN 50
ELSE 25
END
+ COALESCE(cv.epss_score * 100, 0)
+ (CASE WHEN cv.kev_listed THEN 50 ELSE 0 END)
)::INT AS priority_score
FROM analytics.component_vulns cv
JOIN analytics.components c ON c.component_id = cv.component_id
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
AND vo.vuln_id = cv.vuln_id
AND vo.status = 'not_affected'
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.affects = TRUE
AND cv.fix_available = TRUE
AND vo.override_id IS NULL
AND a.environment = 'prod'
ORDER BY priority_score DESC, a.name
LIMIT 100;
```
**Use case**: Prioritize remediation work based on risk and fixability.
### 5. Build Integrity / Attestation Coverage
Shows attestation gaps by environment and team.
```sql
-- Via stored procedure
SELECT analytics.sp_attestation_gaps('prod');
-- Direct query with gap analysis
SELECT
environment,
team,
total_artifacts,
with_provenance,
total_artifacts - with_provenance AS missing_provenance,
provenance_pct,
slsa_level_2_plus,
slsa2_pct,
with_sbom_attestation,
with_vex_attestation
FROM analytics.mv_attestation_coverage
WHERE environment = 'prod'
ORDER BY provenance_pct ASC;
```
**Use case**: Identify teams/environments not meeting attestation requirements.
## Trend Analysis Queries
### 6. Vulnerability Trend (30 Days)
```sql
SELECT
snapshot_date,
environment,
SUM(total_vulns) AS total_vulns,
SUM(fixable_vulns) AS fixable_vulns,
SUM(vex_mitigated) AS vex_mitigated,
SUM(total_vulns) - SUM(vex_mitigated) AS net_exposure,
SUM(kev_vulns) AS kev_vulns
FROM analytics.daily_vulnerability_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY snapshot_date, environment
ORDER BY environment, snapshot_date;
```
### 7. Vulnerability Trend by Severity
```sql
SELECT
snapshot_date,
severity::TEXT,
SUM(total_vulns) AS total_vulns
FROM analytics.daily_vulnerability_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
AND environment = 'prod'
GROUP BY snapshot_date, severity
ORDER BY snapshot_date,
CASE severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'medium' THEN 3
ELSE 4
END;
```
### 8. Component Growth Trend
```sql
SELECT
snapshot_date,
environment,
SUM(total_components) AS total_components,
SUM(unique_suppliers) AS unique_suppliers
FROM analytics.daily_component_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY snapshot_date, environment
ORDER BY environment, snapshot_date;
```
## Deep-Dive Queries
### 9. Component Impact Analysis
Find all artifacts affected by a specific component.
```sql
SELECT
a.name AS artifact,
a.version,
a.environment,
a.team,
ac.depth AS dependency_depth,
ac.introduced_via
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
WHERE c.purl LIKE 'pkg:maven/org.apache.logging.log4j/log4j-core%'
ORDER BY a.environment, a.name;
```
### 10. CVE Impact Analysis
Find all artifacts affected by a specific CVE.
```sql
SELECT DISTINCT
a.name AS artifact,
a.version,
a.environment,
a.team,
c.name AS component,
c.version AS component_version,
cv.cvss_score,
cv.fixed_version,
CASE
WHEN vo.status = 'not_affected' THEN 'VEX Mitigated'
WHEN cv.fix_available THEN 'Fix Available'
ELSE 'Vulnerable'
END AS status
FROM analytics.component_vulns cv
JOIN analytics.components c ON c.component_id = cv.component_id
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
AND vo.vuln_id = cv.vuln_id
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.vuln_id = 'CVE-2021-44228'
ORDER BY a.environment, a.name;
```
### 11. Supplier Vulnerability Profile
Detailed vulnerability breakdown for a specific supplier.
```sql
SELECT
c.supplier_normalized AS supplier,
c.name AS component,
c.version,
cv.vuln_id,
cv.severity::TEXT,
cv.cvss_score,
cv.kev_listed,
cv.fix_available,
cv.fixed_version
FROM analytics.components c
JOIN analytics.component_vulns cv ON cv.component_id = c.component_id
WHERE c.supplier_normalized = 'apache software foundation'
AND cv.affects = TRUE
ORDER BY
CASE cv.severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
ELSE 3
END,
cv.cvss_score DESC;
```
### 12. License Compliance Report
Components with concerning licenses in production.
```sql
SELECT
c.name AS component,
c.version,
c.license_concluded,
c.license_category::TEXT,
c.supplier_normalized AS supplier,
COUNT(DISTINCT a.artifact_id) AS artifact_count,
ARRAY_AGG(DISTINCT a.name ORDER BY a.name) AS affected_artifacts
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
WHERE c.license_category IN ('copyleft-strong', 'proprietary', 'unknown')
AND a.environment = 'prod'
GROUP BY c.component_id, c.name, c.version, c.license_concluded, c.license_category, c.supplier_normalized
ORDER BY c.license_category, artifact_count DESC;
```
### 13. MTTR Analysis
Mean time to remediate by severity.
```sql
SELECT
cv.severity::TEXT,
COUNT(*) AS remediated_vulns,
AVG(EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400)::NUMERIC(10,2) AS avg_days_to_mitigate,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400
)::NUMERIC(10,2) AS median_days,
PERCENTILE_CONT(0.9) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400
)::NUMERIC(10,2) AS p90_days
FROM analytics.component_vulns cv
JOIN analytics.vex_overrides vo ON vo.vuln_id = cv.vuln_id
AND vo.status = 'not_affected'
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.published_at >= now() - INTERVAL '90 days'
AND cv.published_at IS NOT NULL
GROUP BY cv.severity
ORDER BY
CASE cv.severity
WHEN 'critical' THEN 1
WHEN 'high' THEN 2
WHEN 'medium' THEN 3
ELSE 4
END;
```
### 14. Transitive Dependency Risk
Components introduced through transitive dependencies.
```sql
SELECT
c.name AS transitive_component,
c.version,
ac.introduced_via AS direct_dependency,
ac.depth,
COUNT(DISTINCT cv.vuln_id) AS vuln_count,
SUM(CASE WHEN cv.severity = 'critical' THEN 1 ELSE 0 END) AS critical_count,
COUNT(DISTINCT a.artifact_id) AS affected_artifacts
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.component_vulns cv ON cv.component_id = c.component_id AND cv.affects = TRUE
WHERE ac.depth > 0 -- Transitive only
AND a.environment = 'prod'
GROUP BY c.component_id, c.name, c.version, ac.introduced_via, ac.depth
HAVING COUNT(cv.vuln_id) > 0
ORDER BY critical_count DESC, vuln_count DESC
LIMIT 50;
```
### 15. VEX Effectiveness Report
How effective is the VEX program at reducing noise?
```sql
SELECT
DATE_TRUNC('week', vo.created_at)::DATE AS week,
COUNT(*) AS total_overrides,
COUNT(*) FILTER (WHERE vo.status = 'not_affected') AS not_affected,
COUNT(*) FILTER (WHERE vo.status = 'affected') AS confirmed_affected,
COUNT(*) FILTER (WHERE vo.status = 'under_investigation') AS under_investigation,
COUNT(*) FILTER (WHERE vo.status = 'fixed') AS marked_fixed,
-- Noise reduction rate
ROUND(100.0 * COUNT(*) FILTER (WHERE vo.status = 'not_affected') / NULLIF(COUNT(*), 0), 1) AS noise_reduction_pct
FROM analytics.vex_overrides vo
WHERE vo.created_at >= now() - INTERVAL '90 days'
GROUP BY DATE_TRUNC('week', vo.created_at)
ORDER BY week;
```
## Performance Tips
1. **Use materialized views**: Queries prefixed with `mv_` are pre-computed and fast
2. **Add environment filter**: Most queries benefit from `WHERE environment = 'prod'`
3. **Use stored procedures**: `sp_*` functions return JSON and handle caching
4. **Limit results**: Always use `LIMIT` for large result sets
5. **Check refresh times**: Views are refreshed daily; data may be up to 24h stale
## Query Parameters
Common filter parameters:
| Parameter | Type | Example | Notes |
|-----------|------|---------|-------|
| `environment` | TEXT | `'prod'`, `'stage'` | Filter by deployment environment |
| `team` | TEXT | `'platform'` | Filter by owning team |
| `severity` | TEXT | `'critical'`, `'high'` | Minimum severity level |
| `days` | INT | `30`, `90` | Lookback period |
| `limit` | INT | `20`, `100` | Max results |