consolidation of some of the modules, localization fixes, product advisories work, qa work
This commit is contained in:
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_note": "Placeholder Grafana dashboard stub for offline import. Populate with panel definitions when metrics endpoints are available; see runbooks/observability.md for expected panels.",
|
||||
"schemaVersion": 39,
|
||||
"title": "Vuln Explorer Observability (stub)",
|
||||
"panels": []
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
# Vuln Explorer observability runbook (demo snapshot · 2025-11-29)
|
||||
|
||||
## Dashboards (offline-friendly)
|
||||
- Grafana JSON: `docs/modules/vuln-explorer/runbooks/dashboards/vuln-explorer-observability.json` (import locally; no external data sources assumed).
|
||||
- Ops dashboards: `ops/devops/vuln/dashboards/vuln-explorer.json` (CI/staging) adds API latency p95, projection lag, error rate, query budget enforcement.
|
||||
|
||||
## Key metrics
|
||||
- `vuln_projection_lag_seconds{tenant}` – seconds between latest ledger event and projector head.
|
||||
- `vuln_findings_open_total{severity,tenant}` – count of open findings by severity.
|
||||
- `vuln_export_duration_seconds_bucket` – histogram for export job runtime.
|
||||
- `vuln_projection_backlog_total` – queued events awaiting projection.
|
||||
- `vuln_triage_actions_total{type}` – immutable triage actions (assign, comment, risk_accept, remediation_note).
|
||||
- `vuln_api_request_duration_seconds_bucket{route}` – API latency for `GET /v1/findings*` and `POST /v1/reports`.
|
||||
- `vuln_query_hashes_total{tenant,query_hash}` – hashed query shapes (no PII) to observe cache effectiveness.
|
||||
- `vuln_api_payload_bytes_bucket{direction}` – request/response size histograms to spot oversized payloads.
|
||||
|
||||
## Logs & traces
|
||||
- Correlate by `correlationId` and `findingId`. Structured fields: `tenant`, `advisoryKey`, `policyVersion`, `projectId`, `route`.
|
||||
- Query PII guardrail: request filters are hashed (SHA-256 with deployment salt); raw filters are not logged. Strings longer than 128 chars are truncated; known PII fields (`email`, `userId`) are dropped before logging.
|
||||
- Trace exemplar anchors: `traceparent` headers are copied into logs; exporters stay disabled by default for air-gap. Enable by setting `Telemetry:ExportEnabled=true` and pointing to on-prem Tempo/Jaeger.
|
||||
|
||||
## Health/diagnostics
|
||||
- `/health/liveness` and `/health/readiness` (HTTP 200 expected; readiness checks PostgreSQL + cache reachability).
|
||||
- `/status` returns build version, git commit, and enabled features; safe for anonymous fetch in sealed environments.
|
||||
- Ledger replay check: `GET /v1/findings?projectionMode=verify` emits `X-Vuln-Projection-Head` for quick consistency probes.
|
||||
|
||||
## Alert hints (wire to local Alertmanager or watchdog)
|
||||
- Projection lag > 120s for any tenant.
|
||||
- API p99 latency > 800ms for `GET /v1/findings` or `POST /v1/reports`.
|
||||
- Export failure rate > 2% over 10m window.
|
||||
- Accepted-risk approaching expiry within 7d (emit Notify event `vuln.accepted_risk.expiring`).
|
||||
|
||||
## Offline verification steps
|
||||
1) Import Grafana JSON locally and point to Prometheus scrape job `vuln-explorer`.
|
||||
2) Run `stella vuln export --format json --manifest out/manifest.json` and validate hashes using `jq -r '.files[].sha256'` against generated bundle.
|
||||
3) Use `curl -s "$BASEURL/status" | jq '{commit,version,features}'` to confirm expected build metadata matches the exported bundle manifest.
|
||||
|
||||
## Evidence locations
|
||||
- Sprint alignment: `docs/implplan/SPRINT_0334_0001_0001_docs_modules_vuln_explorer.md`.
|
||||
- API contract draft: `docs/modules/vuln-explorer/api.md` and OpenAPI at `docs/modules/vuln-explorer/openapi/vuln-explorer.v1.yaml`.
|
||||
- Schema references: `docs/modules/vuln-explorer/architecture.md` (ledger model, VEX decision schemas).
|
||||
Reference in New Issue
Block a user