- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem. - Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB. - Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB. - Developed unit tests for filesystem and MongoDB provenance writers. - Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling. - Implemented `TimelineIngestionService` to validate and persist timeline events with hashing. - Created PostgreSQL schema and migration scripts for timeline indexing. - Added dependency injection support for timeline indexer services. - Developed tests for timeline ingestion and schema validation.
2.3 KiB
2.3 KiB
Zastava observability runbook (stub · 2025-11-29 demo)
Dashboards (offline import)
- Grafana JSON:
docs/modules/zastava/operations/dashboards/zastava-observability.json(import locally; no external data sources assumed). - Planned panels: admission decision rate, webhook latency p95/p99, cache freshness (Surface.FS), Surface.Env key misses, Secrets fetch failures, policy violation counts, and drift events.
Key metrics
zastava_admission_latency_seconds_bucket{webhook}— admission webhook latency.zastava_admission_decisions_total{result}— allow/deny counts.zastava_surface_env_miss_total— Surface.Env key misses.zastava_surface_secrets_failures_total{reason}— secret retrieval failures.zastava_surface_fs_cache_freshness_seconds— cache age vs Scanner surface metadata.zastava_drift_events_total{type}— drift detections by category.
Logs & traces
- Correlate by
correlationId,tenant,cluster, andadmissionId. IncludepolicyVersion,surfaceEnvProfile, andsecretsProviderfields. - Traces disabled by default for air-gap; enable via
Telemetry:ExportEnabled=truepointing to on-prem collector.
Health/diagnostics
/health/livenessand/health/readiness(webhook + observer) check cache reachability, Secrets provider connectivity, and policy fetch./statusexposes build version, commit, feature flags; verify against offline bundle manifest.- Cache probe:
GET /surface/fs/cache/statusreturns freshness and hash for cached surfaces.
Alert hints
- Admission latency p99 > 800ms.
- Deny rate spike > 5% over 10m without policy change.
- Surface.Env miss rate > 1% or Secrets failure > 0 over 10m.
- Cache freshness > 10m behind Scanner surface metadata.
Offline verification steps
- Import Grafana JSON locally; point to Prometheus scrape labeled
zastava. - Replay a sealed admission bundle and verify
/status+ cache probe hashes match the manifest in the offline kit. - Run webhook smoke (
kubectl apply --dry-run=server -f samples/admission-request.yaml) and confirm metrics increment locally.
Evidence locations
- Sprint tracker:
docs/implplan/SPRINT_0335_0001_0001_docs_modules_zastava.md. - Module docs:
README.md,architecture.md,implementation_plan.md. - Dashboard stub:
operations/dashboards/zastava-observability.json.