# Zastava observability runbook (stub · 2025-11-29 demo) ## Dashboards (offline import) - Grafana JSON: `docs/modules/zastava/operations/dashboards/zastava-observability.json` (import locally; no external data sources assumed). - Planned panels: admission decision rate, webhook latency p95/p99, cache freshness (Surface.FS), Surface.Env key misses, Secrets fetch failures, policy violation counts, and drift events. ## Key metrics - `zastava_admission_latency_seconds_bucket{webhook}` — admission webhook latency. - `zastava_admission_decisions_total{result}` — allow/deny counts. - `zastava_surface_env_miss_total` — Surface.Env key misses. - `zastava_surface_secrets_failures_total{reason}` — secret retrieval failures. - `zastava_surface_fs_cache_freshness_seconds` — cache age vs Scanner surface metadata. - `zastava_drift_events_total{type}` — drift detections by category. ## Logs & traces - Correlate by `correlationId`, `tenant`, `cluster`, and `admissionId`. Include `policyVersion`, `surfaceEnvProfile`, and `secretsProvider` fields. - Traces disabled by default for air-gap; enable via `Telemetry:ExportEnabled=true` pointing to on-prem collector. ## Health/diagnostics - `/health/liveness` and `/health/readiness` (webhook + observer) check cache reachability, Secrets provider connectivity, and policy fetch. - `/status` exposes build version, commit, feature flags; verify against offline bundle manifest. - Cache probe: `GET /surface/fs/cache/status` returns freshness and hash for cached surfaces. ## Alert hints - Admission latency p99 > 800ms. - Deny rate spike > 5% over 10m without policy change. - Surface.Env miss rate > 1% or Secrets failure > 0 over 10m. - Cache freshness > 10m behind Scanner surface metadata. ## Offline verification steps 1) Import Grafana JSON locally; point to Prometheus scrape labeled `zastava`. 2) Replay a sealed admission bundle and verify `/status` + cache probe hashes match the manifest in the offline kit. 3) Run webhook smoke (`kubectl apply --dry-run=server -f samples/admission-request.yaml`) and confirm metrics increment locally. ## Evidence locations - Sprint tracker: `docs/implplan/SPRINT_0335_0001_0001_docs_modules_zastava.md`. - Module docs: `README.md`, `architecture.md`, `implementation_plan.md`. - Dashboard stub: `operations/dashboards/zastava-observability.json`.