Files
git.stella-ops.org/src/Zastava/StellaOps.Zastava.Webhook/IMPLEMENTATION_PLAN.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

7.3 KiB

Zastava Webhook · Wave 0 Implementation Notes

Authored 2025-10-19 by Zastava Webhook Guild.

ZASTAVA-WEBHOOK-12-101 — Admission Controller Host (TLS bootstrap + Authority auth)

Objectives

  • Provide a deterministic, restart-safe .NET 10 host that exposes a Kubernetes ValidatingAdmissionWebhook endpoint.
  • Load serving certificates at start-up only (per restart-time plug-in rule) and surface reload guidance via documentation rather than hot-reload.
  • Authenticate outbound calls to Authority/Scanner using OpTok + DPoP as defined in docs/modules/zastava/ARCHITECTURE.md.

Plan

  1. Project scaffolding
    • Create StellaOps.Zastava.Webhook project with minimal API pipeline (Program.cs, Startup equivalent via extension methods).
    • Reference shared helpers once ZASTAVA-CORE-12-201/202 land; temporarily stub interfaces behind IZastavaAdmissionRequest/IZastavaAdmissionResult.
  2. TLS bootstrap
    • Support two certificate sources:
      1. Mounted secret path (/var/run/secrets/zastava-webhook/tls.{crt,key}) with optional CA bundle.
      2. CSR workflow: generate CSR + private key, submit to Kubernetes Certificates API when admission.tls.autoApprove enabled; persist signed cert/key to mounted emptyDir for reuse across replicas.
    • Validate cert/key pair on boot; abort start-up if invalid to preserve deterministic behavior.
    • Configure Kestrel for mutual TLS off (API Server already provides client auth) but enforce minimum TLS 1.3, strong cipher suite list, HTTP/2 disabled (K8s uses HTTP/1.1).
  3. Authority auth
    • Bootstrap Authority client via shared runtime core (AddZastavaRuntimeCore + IZastavaAuthorityTokenProvider) so webhook reuses multitenant OpTok caching and guardrails.
    • Implement DPoP proof generator bound to webhook host keypair (prefer Ed25519) with configurable rotation period (default 24h, triggered at restart).
    • Add background health check verifying token freshness and surfacing metrics (zastava.authority_token_renew_failures_total).
  4. Hosting concerns
    • Configure structured logging with correlation id from AdmissionReview UID.
    • Expose /healthz (reads cert expiry, Authority token status) and /metrics (Prometheus).
    • Add readiness gate that requires initial TLS and Authority bootstrap to succeed.

Deliverables

  • Compilable host project with integration tests covering TLS load (mounted files + CSR mock) and Authority token acquisition.
  • Documentation snippet for deploy charts describing secret/CSR wiring.

Open Questions

  • Need confirmation from Core guild on DTO naming (AdmissionReviewEnvelope, AdmissionDecision) to avoid rework.
  • Determine whether CSR auto-approval is acceptable for air-gapped clusters without Kubernetes cert-manager; may require fallback manual cert import path.

ZASTAVA-WEBHOOK-12-102 — Backend policy query & digest resolution

Objectives

  • Resolve all images within AdmissionReview to immutable digests before policy evaluation.
  • Call Scanner WebService /api/v1/scanner/policy/runtime with namespace/labels/images payload, enforce verdicts with deterministic error messaging.

Plan

  1. Image resolution
    • Implement resolver service with pluggable strategies:
      • Use existing digest if present.
      • Resolve tags via registry HEAD (respecting admission.resolveTags flag); fallback to Observer-provided digest once core DTOs available.
    • Cache per-registry auth to minimise latency; adhere to allow/deny lists from configuration.
  2. Scanner client
    • Define typed request/response models mirroring docs/modules/zastava/ARCHITECTURE.md structure (ttlSeconds, results[digest] -> { signed, hasSbom, policyVerdict, reasons, rekor }).
    • Implement retry policy (3 attempts, exponential backoff) and map HTTP errors to webhook fail-open/closed depending on namespace configuration.
    • Instrument latency (zastava.backend_latency_seconds) and failure counts.
  3. Verdict enforcement
    • Evaluate per-image results: if any policyVerdict != pass (or warn when enforceWarnings=false), deny with aggregated reasons.
    • Attach ttlSeconds to admission response annotations for auditing.
    • Record structured logs with namespace, pod, image digest, decision, reasons, backend latency.
  4. Contract coordination
    • Schedule joint review with Scanner WebService guild once SCANNER-RUNTIME-12-302 schema stabilises; track in TASKS sub-items.
    • Provide sample payload fixtures for CLI team (CLI-RUNTIME-13-005) to validate table output; ensure field names stay aligned.

Deliverables

  • Registry resolver unit tests (tag->digest) with deterministic fixtures.
  • HTTP client integration tests using Scanner stub returning varied verdict combinations.
  • Documentation update summarising contract and failure handling.

Open Questions

  • Confirm expected policy verdict enumeration (pass|warn|fail|error?) and textual reason codes.
  • Need TTL behaviour: should webhook reduce TTL when backend returns > configured max?

ZASTAVA-WEBHOOK-12-103 — Caching, fail-open/closed toggles, metrics/logging

Objectives

  • Provide deterministic caching layer respecting backend TTL while ensuring eviction on policy mutation.
  • Allow namespace-scoped fail-open behaviour with explicit metrics and alerts.
  • Surface actionable metrics/logging aligned with Architecture doc.

Plan

  1. Cache design
    • In-memory LRU keyed by image digest; value carries verdict payload + expiry timestamp.
    • Support optional persistent seed (read-only) to prime hot digests for offline clusters (config: admission.cache.seedPath).
    • On startup, load seed file and emit metric zastava.cache_seed_entries_total.
    • Evict entries on TTL or when policyRevision annotation in AdmissionReview changes (requires hook from Core DTO).
  2. Fail-open/closed toggles
    • Configuration: global default + namespace overrides through admission.failOpenNamespaces, admission.failClosedNamespaces.
    • Decision matrix:
      • Backend success + verdict PASS → allow.
      • Backend success + non-pass → deny unless namespace override says warn allowed.
      • Backend failure → allow if namespace fail-open, deny otherwise; annotate response with zastava.ops/fail-open=true.
    • Implement policy change event hook (future) to clear cache if observer signals revocation.
  3. Metrics & logging
    • Counters: zastava.admission_requests_total{decision}, zastava.cache_hits_total{result=hit|miss}, zastava.fail_open_total, zastava.backend_failures_total{stage}.
    • Histograms: zastava.admission_latency_seconds (overall), zastava.resolve_latency_seconds.
    • Logs: structured JSON with decision, namespace, pod, imageDigest, reasons, cacheStatus, failMode.
    • Optionally emit OpenTelemetry span for admission path with attributes capturing backend latency + cache path.
  4. Testing & ops hooks
    • Unit tests for cache TTL, namespace override logic, fail-open metric increments.
    • Integration test simulating backend outage ensuring fail-open/closed behaviour matches config.
    • Document runbook snippet describing interpreting metrics and toggling namespaces.

Open Questions

  • Confirm whether cache entries should include policyRevision to detect backend policy updates; requires coordination with Policy guild.
  • Need guidance on maximum cache size (default suggestions: 5k entries per replica?) to avoid memory blow-up.