- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
7.3 KiB
7.3 KiB
Zastava Webhook · Wave 0 Implementation Notes
Authored 2025-10-19 by Zastava Webhook Guild.
ZASTAVA-WEBHOOK-12-101 — Admission Controller Host (TLS bootstrap + Authority auth)
Objectives
- Provide a deterministic, restart-safe .NET 10 host that exposes a Kubernetes ValidatingAdmissionWebhook endpoint.
- Load serving certificates at start-up only (per restart-time plug-in rule) and surface reload guidance via documentation rather than hot-reload.
- Authenticate outbound calls to Authority/Scanner using OpTok + DPoP as defined in
docs/modules/zastava/ARCHITECTURE.md.
Plan
- Project scaffolding
- Create
StellaOps.Zastava.Webhookproject with minimal API pipeline (Program.cs,Startupequivalent via extension methods). - Reference shared helpers once
ZASTAVA-CORE-12-201/202land; temporarily stub interfaces behindIZastavaAdmissionRequest/IZastavaAdmissionResult.
- Create
- TLS bootstrap
- Support two certificate sources:
- Mounted secret path (
/var/run/secrets/zastava-webhook/tls.{crt,key}) with optional CA bundle. - CSR workflow: generate CSR + private key, submit to Kubernetes Certificates API when
admission.tls.autoApproveenabled; persist signed cert/key to mounted emptyDir for reuse across replicas.
- Mounted secret path (
- Validate cert/key pair on boot; abort start-up if invalid to preserve deterministic behavior.
- Configure Kestrel for mutual TLS off (API Server already provides client auth) but enforce minimum TLS 1.3, strong cipher suite list, HTTP/2 disabled (K8s uses HTTP/1.1).
- Support two certificate sources:
- Authority auth
- Bootstrap Authority client via shared runtime core (
AddZastavaRuntimeCore+IZastavaAuthorityTokenProvider) so webhook reuses multitenant OpTok caching and guardrails. - Implement DPoP proof generator bound to webhook host keypair (prefer Ed25519) with configurable rotation period (default 24h, triggered at restart).
- Add background health check verifying token freshness and surfacing metrics (
zastava.authority_token_renew_failures_total).
- Bootstrap Authority client via shared runtime core (
- Hosting concerns
- Configure structured logging with correlation id from AdmissionReview UID.
- Expose
/healthz(reads cert expiry, Authority token status) and/metrics(Prometheus). - Add readiness gate that requires initial TLS and Authority bootstrap to succeed.
Deliverables
- Compilable host project with integration tests covering TLS load (mounted files + CSR mock) and Authority token acquisition.
- Documentation snippet for deploy charts describing secret/CSR wiring.
Open Questions
- Need confirmation from Core guild on DTO naming (
AdmissionReviewEnvelope,AdmissionDecision) to avoid rework. - Determine whether CSR auto-approval is acceptable for air-gapped clusters without Kubernetes cert-manager; may require fallback manual cert import path.
ZASTAVA-WEBHOOK-12-102 — Backend policy query & digest resolution
Objectives
- Resolve all images within AdmissionReview to immutable digests before policy evaluation.
- Call Scanner WebService
/api/v1/scanner/policy/runtimewith namespace/labels/images payload, enforce verdicts with deterministic error messaging.
Plan
- Image resolution
- Implement resolver service with pluggable strategies:
- Use existing digest if present.
- Resolve tags via registry HEAD (respecting
admission.resolveTagsflag); fallback to Observer-provided digest once core DTOs available.
- Cache per-registry auth to minimise latency; adhere to allow/deny lists from configuration.
- Implement resolver service with pluggable strategies:
- Scanner client
- Define typed request/response models mirroring
docs/modules/zastava/ARCHITECTURE.mdstructure (ttlSeconds,results[digest] -> { signed, hasSbom, policyVerdict, reasons, rekor }). - Implement retry policy (3 attempts, exponential backoff) and map HTTP errors to webhook fail-open/closed depending on namespace configuration.
- Instrument latency (
zastava.backend_latency_seconds) and failure counts.
- Define typed request/response models mirroring
- Verdict enforcement
- Evaluate per-image results: if any
policyVerdict != pass(orwarnwhenenforceWarnings=false), deny with aggregated reasons. - Attach
ttlSecondsto admission response annotations for auditing. - Record structured logs with namespace, pod, image digest, decision, reasons, backend latency.
- Evaluate per-image results: if any
- Contract coordination
- Schedule joint review with Scanner WebService guild once SCANNER-RUNTIME-12-302 schema stabilises; track in TASKS sub-items.
- Provide sample payload fixtures for CLI team (
CLI-RUNTIME-13-005) to validate table output; ensure field names stay aligned.
Deliverables
- Registry resolver unit tests (tag->digest) with deterministic fixtures.
- HTTP client integration tests using Scanner stub returning varied verdict combinations.
- Documentation update summarising contract and failure handling.
Open Questions
- Confirm expected policy verdict enumeration (
pass|warn|fail|error?) and textual reason codes. - Need TTL behaviour: should webhook reduce TTL when backend returns > configured max?
ZASTAVA-WEBHOOK-12-103 — Caching, fail-open/closed toggles, metrics/logging
Objectives
- Provide deterministic caching layer respecting backend TTL while ensuring eviction on policy mutation.
- Allow namespace-scoped fail-open behaviour with explicit metrics and alerts.
- Surface actionable metrics/logging aligned with Architecture doc.
Plan
- Cache design
- In-memory LRU keyed by image digest; value carries verdict payload + expiry timestamp.
- Support optional persistent seed (read-only) to prime hot digests for offline clusters (config:
admission.cache.seedPath). - On startup, load seed file and emit metric
zastava.cache_seed_entries_total. - Evict entries on TTL or when
policyRevisionannotation in AdmissionReview changes (requires hook from Core DTO).
- Fail-open/closed toggles
- Configuration: global default + namespace overrides through
admission.failOpenNamespaces,admission.failClosedNamespaces. - Decision matrix:
- Backend success + verdict PASS → allow.
- Backend success + non-pass → deny unless namespace override says warn allowed.
- Backend failure → allow if namespace fail-open, deny otherwise; annotate response with
zastava.ops/fail-open=true.
- Implement policy change event hook (future) to clear cache if observer signals revocation.
- Configuration: global default + namespace overrides through
- Metrics & logging
- Counters:
zastava.admission_requests_total{decision},zastava.cache_hits_total{result=hit|miss},zastava.fail_open_total,zastava.backend_failures_total{stage}. - Histograms:
zastava.admission_latency_seconds(overall),zastava.resolve_latency_seconds. - Logs: structured JSON with
decision,namespace,pod,imageDigest,reasons,cacheStatus,failMode. - Optionally emit OpenTelemetry span for admission path with attributes capturing backend latency + cache path.
- Counters:
- Testing & ops hooks
- Unit tests for cache TTL, namespace override logic, fail-open metric increments.
- Integration test simulating backend outage ensuring fail-open/closed behaviour matches config.
- Document runbook snippet describing interpreting metrics and toggling namespaces.
Open Questions
- Confirm whether cache entries should include
policyRevisionto detect backend policy updates; requires coordination with Policy guild. - Need guidance on maximum cache size (default suggestions: 5k entries per replica?) to avoid memory blow-up.