Files
git.stella-ops.org/docs/modules/vuln-explorer/runbooks/observability.md
StellaOps Bot 2d08f52715 feat(zastava): add evidence locker plan and schema examples
- Introduced README.md for Zastava Evidence Locker Plan detailing artifacts to sign and post-signing steps.
- Added example JSON schemas for observer events and webhook admissions.
- Updated implementor guidelines with checklist for CI linting, determinism, secrets management, and schema control.
- Created alert rules for Vuln Explorer to monitor API latency and projection errors.
- Developed analytics ingestion plan for Vuln Explorer, focusing on telemetry and PII guardrails.
- Implemented Grafana dashboard configuration for Vuln Explorer metrics visualization.
- Added expected projection SHA256 for vulnerability events.
- Created k6 load testing script for Vuln Explorer API.
- Added sample projection and replay event data for testing.
- Implemented ReplayInputsLock for deterministic replay inputs management.
- Developed tests for ReplayInputsLock to ensure stable hash computation.
- Created SurfaceManifestDeterminismVerifier to validate manifest determinism and integrity.
- Added unit tests for SurfaceManifestDeterminismVerifier to ensure correct functionality.
- Implemented Angular tests for VulnerabilityHttpClient and VulnerabilityDetailComponent to verify API interactions and UI rendering.
2025-12-02 09:27:31 +02:00

3.2 KiB
Raw Blame History

Vuln Explorer observability runbook (demo snapshot · 2025-11-29)

Dashboards (offline-friendly)

  • Grafana JSON: docs/modules/vuln-explorer/runbooks/dashboards/vuln-explorer-observability.json (import locally; no external data sources assumed).
  • Ops dashboards: ops/devops/vuln/dashboards/vuln-explorer.json (CI/staging) adds API latency p95, projection lag, error rate, query budget enforcement.

Key metrics

  • vuln_projection_lag_seconds{tenant} seconds between latest ledger event and projector head.
  • vuln_findings_open_total{severity,tenant} count of open findings by severity.
  • vuln_export_duration_seconds_bucket histogram for export job runtime.
  • vuln_projection_backlog_total queued events awaiting projection.
  • vuln_triage_actions_total{type} immutable triage actions (assign, comment, risk_accept, remediation_note).
  • vuln_api_request_duration_seconds_bucket{route} API latency for GET /v1/findings* and POST /v1/reports.
  • vuln_query_hashes_total{tenant,query_hash} hashed query shapes (no PII) to observe cache effectiveness.
  • vuln_api_payload_bytes_bucket{direction} request/response size histograms to spot oversized payloads.

Logs & traces

  • Correlate by correlationId and findingId. Structured fields: tenant, advisoryKey, policyVersion, projectId, route.
  • Query PII guardrail: request filters are hashed (SHA-256 with deployment salt); raw filters are not logged. Strings longer than 128 chars are truncated; known PII fields (email, userId) are dropped before logging.
  • Trace exemplar anchors: traceparent headers are copied into logs; exporters stay disabled by default for air-gap. Enable by setting Telemetry:ExportEnabled=true and pointing to on-prem Tempo/Jaeger.

Health/diagnostics

  • /health/liveness and /health/readiness (HTTP 200 expected; readiness checks Mongo + cache reachability).
  • /status returns build version, git commit, and enabled features; safe for anonymous fetch in sealed environments.
  • Ledger replay check: GET /v1/findings?projectionMode=verify emits X-Vuln-Projection-Head for quick consistency probes.

Alert hints (wire to local Alertmanager or watchdog)

  • Projection lag > 120s for any tenant.
  • API p99 latency > 800ms for GET /v1/findings or POST /v1/reports.
  • Export failure rate > 2% over 10m window.
  • Accepted-risk approaching expiry within 7d (emit Notify event vuln.accepted_risk.expiring).

Offline verification steps

  1. Import Grafana JSON locally and point to Prometheus scrape job vuln-explorer.
  2. Run stella vuln export --format json --manifest out/manifest.json and validate hashes using jq -r '.files[].sha256' against generated bundle.
  3. Use curl -s "$BASEURL/status" | jq '{commit,version,features}' to confirm expected build metadata matches the exported bundle manifest.

Evidence locations

  • Sprint alignment: docs/implplan/SPRINT_0334_0001_0001_docs_modules_vuln_explorer.md.
  • API contract draft: docs/modules/vuln-explorer/api.md and OpenAPI at docs/modules/vuln-explorer/openapi/vuln-explorer.v1.yaml.
  • Schema references: docs/modules/vuln-explorer/architecture.md (ledger model, VEX decision schemas).