# Zastava Webhook · Wave 0 Implementation Notes > Authored 2025-10-19 by Zastava Webhook Guild. ## ZASTAVA-WEBHOOK-12-101 — Admission Controller Host (TLS bootstrap + Authority auth) **Objectives** - Provide a deterministic, restart-safe .NET 10 host that exposes a Kubernetes ValidatingAdmissionWebhook endpoint. - Load serving certificates at start-up only (per restart-time plug-in rule) and surface reload guidance via documentation rather than hot-reload. - Authenticate outbound calls to Authority/Scanner using OpTok + DPoP as defined in `docs/modules/zastava/ARCHITECTURE.md`. **Plan** 1. **Project scaffolding** - Create `StellaOps.Zastava.Webhook` project with minimal API pipeline (`Program.cs`, `Startup` equivalent via extension methods). - Reference shared helpers once `ZASTAVA-CORE-12-201/202` land; temporarily stub interfaces behind `IZastavaAdmissionRequest`/`IZastavaAdmissionResult`. 2. **TLS bootstrap** - Support two certificate sources: 1. Mounted secret path (`/var/run/secrets/zastava-webhook/tls.{crt,key}`) with optional CA bundle. 2. CSR workflow: generate CSR + private key, submit to Kubernetes Certificates API when `admission.tls.autoApprove` enabled; persist signed cert/key to mounted emptyDir for reuse across replicas. - Validate cert/key pair on boot; abort start-up if invalid to preserve deterministic behavior. - Configure Kestrel for mutual TLS off (API Server already provides client auth) but enforce minimum TLS 1.3, strong cipher suite list, HTTP/2 disabled (K8s uses HTTP/1.1). 3. **Authority auth** - Bootstrap Authority client via shared runtime core (`AddZastavaRuntimeCore` + `IZastavaAuthorityTokenProvider`) so webhook reuses multitenant OpTok caching and guardrails. - Implement DPoP proof generator bound to webhook host keypair (prefer Ed25519) with configurable rotation period (default 24h, triggered at restart). - Add background health check verifying token freshness and surfacing metrics (`zastava.authority_token_renew_failures_total`). 4. **Hosting concerns** - Configure structured logging with correlation id from AdmissionReview UID. - Expose `/healthz` (reads cert expiry, Authority token status) and `/metrics` (Prometheus). - Add readiness gate that requires initial TLS and Authority bootstrap to succeed. **Deliverables** - Compilable host project with integration tests covering TLS load (mounted files + CSR mock) and Authority token acquisition. - Documentation snippet for deploy charts describing secret/CSR wiring. **Open Questions** - Need confirmation from Core guild on DTO naming (`AdmissionReviewEnvelope`, `AdmissionDecision`) to avoid rework. - Determine whether CSR auto-approval is acceptable for air-gapped clusters without Kubernetes cert-manager; may require fallback manual cert import path. ## ZASTAVA-WEBHOOK-12-102 — Backend policy query & digest resolution **Objectives** - Resolve all images within AdmissionReview to immutable digests before policy evaluation. - Call Scanner WebService `/api/v1/scanner/policy/runtime` with namespace/labels/images payload, enforce verdicts with deterministic error messaging. **Plan** 1. **Image resolution** - Implement resolver service with pluggable strategies: - Use existing digest if present. - Resolve tags via registry HEAD (respecting `admission.resolveTags` flag); fallback to Observer-provided digest once core DTOs available. - Cache per-registry auth to minimise latency; adhere to allow/deny lists from configuration. 2. **Scanner client** - Define typed request/response models mirroring `docs/modules/zastava/ARCHITECTURE.md` structure (`ttlSeconds`, `results[digest] -> { signed, hasSbom, policyVerdict, reasons, rekor }`). - Implement retry policy (3 attempts, exponential backoff) and map HTTP errors to webhook fail-open/closed depending on namespace configuration. - Instrument latency (`zastava.backend_latency_seconds`) and failure counts. 3. **Verdict enforcement** - Evaluate per-image results: if any `policyVerdict != pass` (or `warn` when `enforceWarnings=false`), deny with aggregated reasons. - Attach `ttlSeconds` to admission response annotations for auditing. - Record structured logs with namespace, pod, image digest, decision, reasons, backend latency. 4. **Contract coordination** - Schedule joint review with Scanner WebService guild once SCANNER-RUNTIME-12-302 schema stabilises; track in TASKS sub-items. - Provide sample payload fixtures for CLI team (`CLI-RUNTIME-13-005`) to validate table output; ensure field names stay aligned. **Deliverables** - Registry resolver unit tests (tag->digest) with deterministic fixtures. - HTTP client integration tests using Scanner stub returning varied verdict combinations. - Documentation update summarising contract and failure handling. **Open Questions** - Confirm expected policy verdict enumeration (`pass|warn|fail|error`?) and textual reason codes. - Need TTL behaviour: should webhook reduce TTL when backend returns > configured max? ## ZASTAVA-WEBHOOK-12-103 — Caching, fail-open/closed toggles, metrics/logging **Objectives** - Provide deterministic caching layer respecting backend TTL while ensuring eviction on policy mutation. - Allow namespace-scoped fail-open behaviour with explicit metrics and alerts. - Surface actionable metrics/logging aligned with Architecture doc. **Plan** 1. **Cache design** - In-memory LRU keyed by image digest; value carries verdict payload + expiry timestamp. - Support optional persistent seed (read-only) to prime hot digests for offline clusters (config: `admission.cache.seedPath`). - On startup, load seed file and emit metric `zastava.cache_seed_entries_total`. - Evict entries on TTL or when `policyRevision` annotation in AdmissionReview changes (requires hook from Core DTO). 2. **Fail-open/closed toggles** - Configuration: global default + namespace overrides through `admission.failOpenNamespaces`, `admission.failClosedNamespaces`. - Decision matrix: - Backend success + verdict PASS → allow. - Backend success + non-pass → deny unless namespace override says warn allowed. - Backend failure → allow if namespace fail-open, deny otherwise; annotate response with `zastava.ops/fail-open=true`. - Implement policy change event hook (future) to clear cache if observer signals revocation. 3. **Metrics & logging** - Counters: `zastava.admission_requests_total{decision}`, `zastava.cache_hits_total{result=hit|miss}`, `zastava.fail_open_total`, `zastava.backend_failures_total{stage}`. - Histograms: `zastava.admission_latency_seconds` (overall), `zastava.resolve_latency_seconds`. - Logs: structured JSON with `decision`, `namespace`, `pod`, `imageDigest`, `reasons`, `cacheStatus`, `failMode`. - Optionally emit OpenTelemetry span for admission path with attributes capturing backend latency + cache path. 4. **Testing & ops hooks** - Unit tests for cache TTL, namespace override logic, fail-open metric increments. - Integration test simulating backend outage ensuring fail-open/closed behaviour matches config. - Document runbook snippet describing interpreting metrics and toggling namespaces. **Open Questions** - Confirm whether cache entries should include `policyRevision` to detect backend policy updates; requires coordination with Policy guild. - Need guidance on maximum cache size (default suggestions: 5k entries per replica?) to avoid memory blow-up.