Files
git.stella-ops.org/docs/modules/analytics/queries.md
2026-01-22 19:08:46 +02:00

12 KiB

Analytics Query Library

This document provides ready-to-use SQL queries for common analytics use cases. All queries are optimized for the analytics star schema.

Executive Dashboard Queries

1. Top Supplier Concentration (Supply Chain Risk)

Identifies suppliers with the highest component footprint, indicating supply chain concentration risk.

-- Via stored procedure (recommended, optional environment filter)
SELECT analytics.sp_top_suppliers(20, 'prod');

-- Direct query
SELECT
  supplier,
  component_count,
  artifact_count,
  team_count,
  critical_vuln_count,
  high_vuln_count,
  environments
FROM analytics.mv_supplier_concentration
ORDER BY component_count DESC
LIMIT 20;

Use case: Identify vendors that, if compromised, would affect the most artifacts.

2. License Risk Heatmap

Shows distribution of components by license category for compliance review.

-- Via stored procedure (optional environment filter)
SELECT analytics.sp_license_heatmap('prod');

-- Direct query with grouping
SELECT
  license_category,
  SUM(component_count) AS total_components,
  SUM(artifact_count) AS total_artifacts,
  COUNT(DISTINCT license_concluded) AS unique_licenses
FROM analytics.mv_license_distribution
GROUP BY license_category
ORDER BY
  CASE license_category
    WHEN 'copyleft-strong' THEN 1
    WHEN 'proprietary' THEN 2
    WHEN 'unknown' THEN 3
    WHEN 'copyleft-weak' THEN 4
    ELSE 5
  END;

Use case: Compliance review, identify components requiring legal review.

3. CVE Exposure Adjusted by VEX

Shows true vulnerability exposure after applying VEX mitigations.

-- Via stored procedure
SELECT analytics.sp_vuln_exposure('prod', 'high');

-- Direct query showing VEX effectiveness (global view; use sp_vuln_exposure for environment filtering)
SELECT
  vuln_id,
  severity::TEXT,
  cvss_score,
  epss_score,
  kev_listed,
  fix_available,
  raw_artifact_count AS total_affected,
  effective_artifact_count AS actually_affected,
  raw_artifact_count - effective_artifact_count AS vex_mitigated,
  ROUND(100.0 * (raw_artifact_count - effective_artifact_count) / NULLIF(raw_artifact_count, 0), 1) AS mitigation_rate
FROM analytics.mv_vuln_exposure
WHERE effective_artifact_count > 0
ORDER BY
  CASE severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END,
  effective_artifact_count DESC
LIMIT 50;

Use case: Show executives the "real" risk after VEX assessment.

4. Fixable Vulnerability Backlog

Lists vulnerabilities that can be fixed today (fix available, not VEX-mitigated).

-- Via stored procedure
SELECT analytics.sp_fixable_backlog('prod');

-- Direct query with priority scoring
SELECT
  a.name AS service,
  a.environment,
  a.team,
  c.name AS component,
  c.version AS current_version,
  cv.vuln_id,
  cv.severity::TEXT,
  cv.cvss_score,
  cv.epss_score,
  cv.fixed_version,
  cv.kev_listed,
  -- Priority score: higher = fix first
  (
    CASE cv.severity
      WHEN 'critical' THEN 100
      WHEN 'high' THEN 75
      WHEN 'medium' THEN 50
      ELSE 25
    END
    + COALESCE(cv.epss_score * 100, 0)
    + (CASE WHEN cv.kev_listed THEN 50 ELSE 0 END)
  )::INT AS priority_score
FROM analytics.component_vulns cv
JOIN analytics.components c ON c.component_id = cv.component_id
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
  AND vo.vuln_id = cv.vuln_id
  AND vo.status = 'not_affected'
  AND vo.valid_from <= now()
  AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.affects = TRUE
  AND cv.fix_available = TRUE
  AND vo.override_id IS NULL
  AND a.environment = 'prod'
ORDER BY priority_score DESC, a.name
LIMIT 100;

Use case: Prioritize remediation work based on risk and fixability.

5. Build Integrity / Attestation Coverage

Shows attestation gaps by environment and team.

-- Via stored procedure
SELECT analytics.sp_attestation_gaps('prod');

-- Direct query with gap analysis
SELECT
  environment,
  team,
  total_artifacts,
  with_provenance,
  total_artifacts - with_provenance AS missing_provenance,
  provenance_pct,
  slsa_level_2_plus,
  slsa2_pct,
  with_sbom_attestation,
  with_vex_attestation
FROM analytics.mv_attestation_coverage
WHERE environment = 'prod'
ORDER BY provenance_pct ASC;

Use case: Identify teams/environments not meeting attestation requirements.

Trend Analysis Queries

6. Vulnerability Trend (30 Days)

SELECT
  snapshot_date,
  environment,
  SUM(total_vulns) AS total_vulns,
  SUM(fixable_vulns) AS fixable_vulns,
  SUM(vex_mitigated) AS vex_mitigated,
  SUM(total_vulns) - SUM(vex_mitigated) AS net_exposure,
  SUM(kev_vulns) AS kev_vulns
FROM analytics.daily_vulnerability_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY snapshot_date, environment
ORDER BY environment, snapshot_date;

7. Vulnerability Trend by Severity

SELECT
  snapshot_date,
  severity::TEXT,
  SUM(total_vulns) AS total_vulns
FROM analytics.daily_vulnerability_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
  AND environment = 'prod'
GROUP BY snapshot_date, severity
ORDER BY snapshot_date,
  CASE severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END;

8. Component Growth Trend

SELECT
  snapshot_date,
  environment,
  SUM(total_components) AS total_components,
  SUM(unique_suppliers) AS unique_suppliers
FROM analytics.daily_component_counts
WHERE snapshot_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY snapshot_date, environment
ORDER BY environment, snapshot_date;

Deep-Dive Queries

9. Component Impact Analysis

Find all artifacts affected by a specific component.

SELECT
  a.name AS artifact,
  a.version,
  a.environment,
  a.team,
  ac.depth AS dependency_depth,
  ac.introduced_via
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
WHERE c.purl LIKE 'pkg:maven/org.apache.logging.log4j/log4j-core%'
ORDER BY a.environment, a.name;

10. CVE Impact Analysis

Find all artifacts affected by a specific CVE.

SELECT DISTINCT
  a.name AS artifact,
  a.version,
  a.environment,
  a.team,
  c.name AS component,
  c.version AS component_version,
  cv.cvss_score,
  cv.fixed_version,
  CASE
    WHEN vo.status = 'not_affected' THEN 'VEX Mitigated'
    WHEN cv.fix_available THEN 'Fix Available'
    ELSE 'Vulnerable'
  END AS status
FROM analytics.component_vulns cv
JOIN analytics.components c ON c.component_id = cv.component_id
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
  AND vo.vuln_id = cv.vuln_id
  AND vo.valid_from <= now()
  AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.vuln_id = 'CVE-2021-44228'
ORDER BY a.environment, a.name;

11. Supplier Vulnerability Profile

Detailed vulnerability breakdown for a specific supplier.

SELECT
  c.supplier_normalized AS supplier,
  c.name AS component,
  c.version,
  cv.vuln_id,
  cv.severity::TEXT,
  cv.cvss_score,
  cv.kev_listed,
  cv.fix_available,
  cv.fixed_version
FROM analytics.components c
JOIN analytics.component_vulns cv ON cv.component_id = c.component_id
WHERE c.supplier_normalized = 'apache software foundation'
  AND cv.affects = TRUE
ORDER BY
  CASE cv.severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    ELSE 3
  END,
  cv.cvss_score DESC;

12. License Compliance Report

Components with concerning licenses in production.

SELECT
  c.name AS component,
  c.version,
  c.license_concluded,
  c.license_category::TEXT,
  c.supplier_normalized AS supplier,
  COUNT(DISTINCT a.artifact_id) AS artifact_count,
  ARRAY_AGG(DISTINCT a.name ORDER BY a.name) AS affected_artifacts
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
WHERE c.license_category IN ('copyleft-strong', 'proprietary', 'unknown')
  AND a.environment = 'prod'
GROUP BY c.component_id, c.name, c.version, c.license_concluded, c.license_category, c.supplier_normalized
ORDER BY c.license_category, artifact_count DESC;

13. MTTR Analysis

Mean time to remediate by severity.

SELECT
  cv.severity::TEXT,
  COUNT(*) AS remediated_vulns,
  AVG(EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400)::NUMERIC(10,2) AS avg_days_to_mitigate,
  PERCENTILE_CONT(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400
  )::NUMERIC(10,2) AS median_days,
  PERCENTILE_CONT(0.9) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (vo.valid_from - cv.published_at)) / 86400
  )::NUMERIC(10,2) AS p90_days
FROM analytics.component_vulns cv
JOIN analytics.vex_overrides vo ON vo.vuln_id = cv.vuln_id
  AND vo.status = 'not_affected'
  AND vo.valid_from <= now()
  AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.published_at >= now() - INTERVAL '90 days'
  AND cv.published_at IS NOT NULL
GROUP BY cv.severity
ORDER BY
  CASE cv.severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END;

14. Transitive Dependency Risk

Components introduced through transitive dependencies.

SELECT
  c.name AS transitive_component,
  c.version,
  ac.introduced_via AS direct_dependency,
  ac.depth,
  COUNT(DISTINCT cv.vuln_id) AS vuln_count,
  SUM(CASE WHEN cv.severity = 'critical' THEN 1 ELSE 0 END) AS critical_count,
  COUNT(DISTINCT a.artifact_id) AS affected_artifacts
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.component_vulns cv ON cv.component_id = c.component_id AND cv.affects = TRUE
WHERE ac.depth > 0  -- Transitive only
  AND a.environment = 'prod'
GROUP BY c.component_id, c.name, c.version, ac.introduced_via, ac.depth
HAVING COUNT(cv.vuln_id) > 0
ORDER BY critical_count DESC, vuln_count DESC
LIMIT 50;

15. VEX Effectiveness Report

How effective is the VEX program at reducing noise?

SELECT
  DATE_TRUNC('week', vo.created_at)::DATE AS week,
  COUNT(*) AS total_overrides,
  COUNT(*) FILTER (WHERE vo.status = 'not_affected') AS not_affected,
  COUNT(*) FILTER (WHERE vo.status = 'affected') AS confirmed_affected,
  COUNT(*) FILTER (WHERE vo.status = 'under_investigation') AS under_investigation,
  COUNT(*) FILTER (WHERE vo.status = 'fixed') AS marked_fixed,
  -- Noise reduction rate
  ROUND(100.0 * COUNT(*) FILTER (WHERE vo.status = 'not_affected') / NULLIF(COUNT(*), 0), 1) AS noise_reduction_pct
FROM analytics.vex_overrides vo
WHERE vo.created_at >= now() - INTERVAL '90 days'
GROUP BY DATE_TRUNC('week', vo.created_at)
ORDER BY week;

Performance Tips

  1. Use materialized views: Queries prefixed with mv_ are pre-computed and fast
  2. Add environment filter: Most queries benefit from WHERE environment = 'prod'
  3. Use stored procedures: sp_* functions return JSON and handle caching
  4. Limit results: Always use LIMIT for large result sets
  5. Check refresh times: Views are refreshed daily; data may be up to 24h stale

Query Parameters

Common filter parameters:

Parameter Type Example Notes
environment TEXT 'prod', 'stage' Filter by deployment environment
team TEXT 'platform' Filter by owning team
severity TEXT 'critical', 'high' Minimum severity level
days INT 30, 90 Lookback period
limit INT 20, 100 Max results