Files

master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.

2025-10-30 00:09:39 +02:00

7.3 KiB

Raw Blame History

Zastava Webhook · Wave 0 Implementation Notes

Authored 2025-10-19 by Zastava Webhook Guild.

ZASTAVA-WEBHOOK-12-101 — Admission Controller Host (TLS bootstrap + Authority auth)

Objectives

Provide a deterministic, restart-safe .NET 10 host that exposes a Kubernetes ValidatingAdmissionWebhook endpoint.
Load serving certificates at start-up only (per restart-time plug-in rule) and surface reload guidance via documentation rather than hot-reload.
Authenticate outbound calls to Authority/Scanner using OpTok + DPoP as defined in docs/modules/zastava/ARCHITECTURE.md.

Plan

Project scaffolding
- Create StellaOps.Zastava.Webhook project with minimal API pipeline (Program.cs, Startup equivalent via extension methods).
- Reference shared helpers once ZASTAVA-CORE-12-201/202 land; temporarily stub interfaces behind IZastavaAdmissionRequest/IZastavaAdmissionResult.
TLS bootstrap
- Support two certificate sources:
  1. Mounted secret path (/var/run/secrets/zastava-webhook/tls.{crt,key}) with optional CA bundle.
  2. CSR workflow: generate CSR + private key, submit to Kubernetes Certificates API when admission.tls.autoApprove enabled; persist signed cert/key to mounted emptyDir for reuse across replicas.
- Validate cert/key pair on boot; abort start-up if invalid to preserve deterministic behavior.
- Configure Kestrel for mutual TLS off (API Server already provides client auth) but enforce minimum TLS 1.3, strong cipher suite list, HTTP/2 disabled (K8s uses HTTP/1.1).
Authority auth
- Bootstrap Authority client via shared runtime core (AddZastavaRuntimeCore + IZastavaAuthorityTokenProvider) so webhook reuses multitenant OpTok caching and guardrails.
- Implement DPoP proof generator bound to webhook host keypair (prefer Ed25519) with configurable rotation period (default 24h, triggered at restart).
- Add background health check verifying token freshness and surfacing metrics (zastava.authority_token_renew_failures_total).
Hosting concerns
- Configure structured logging with correlation id from AdmissionReview UID.
- Expose /healthz (reads cert expiry, Authority token status) and /metrics (Prometheus).
- Add readiness gate that requires initial TLS and Authority bootstrap to succeed.

Deliverables

Compilable host project with integration tests covering TLS load (mounted files + CSR mock) and Authority token acquisition.
Documentation snippet for deploy charts describing secret/CSR wiring.

Open Questions

Need confirmation from Core guild on DTO naming (AdmissionReviewEnvelope, AdmissionDecision) to avoid rework.
Determine whether CSR auto-approval is acceptable for air-gapped clusters without Kubernetes cert-manager; may require fallback manual cert import path.

ZASTAVA-WEBHOOK-12-102 — Backend policy query & digest resolution

Objectives

Resolve all images within AdmissionReview to immutable digests before policy evaluation.
Call Scanner WebService /api/v1/scanner/policy/runtime with namespace/labels/images payload, enforce verdicts with deterministic error messaging.

Plan

Image resolution
- Implement resolver service with pluggable strategies:
  - Use existing digest if present.
  - Resolve tags via registry HEAD (respecting admission.resolveTags flag); fallback to Observer-provided digest once core DTOs available.
- Cache per-registry auth to minimise latency; adhere to allow/deny lists from configuration.
Scanner client
- Define typed request/response models mirroring docs/modules/zastava/ARCHITECTURE.md structure (ttlSeconds, results[digest] -> { signed, hasSbom, policyVerdict, reasons, rekor }).
- Implement retry policy (3 attempts, exponential backoff) and map HTTP errors to webhook fail-open/closed depending on namespace configuration.
- Instrument latency (zastava.backend_latency_seconds) and failure counts.
Verdict enforcement
- Evaluate per-image results: if any policyVerdict != pass (or warn when enforceWarnings=false), deny with aggregated reasons.
- Attach ttlSeconds to admission response annotations for auditing.
- Record structured logs with namespace, pod, image digest, decision, reasons, backend latency.
Contract coordination
- Schedule joint review with Scanner WebService guild once SCANNER-RUNTIME-12-302 schema stabilises; track in TASKS sub-items.
- Provide sample payload fixtures for CLI team (CLI-RUNTIME-13-005) to validate table output; ensure field names stay aligned.

Deliverables

Registry resolver unit tests (tag->digest) with deterministic fixtures.
HTTP client integration tests using Scanner stub returning varied verdict combinations.
Documentation update summarising contract and failure handling.

Open Questions

Confirm expected policy verdict enumeration (pass|warn|fail|error?) and textual reason codes.
Need TTL behaviour: should webhook reduce TTL when backend returns > configured max?

ZASTAVA-WEBHOOK-12-103 — Caching, fail-open/closed toggles, metrics/logging

Objectives

Provide deterministic caching layer respecting backend TTL while ensuring eviction on policy mutation.
Allow namespace-scoped fail-open behaviour with explicit metrics and alerts.
Surface actionable metrics/logging aligned with Architecture doc.

Plan

Cache design
- In-memory LRU keyed by image digest; value carries verdict payload + expiry timestamp.
- Support optional persistent seed (read-only) to prime hot digests for offline clusters (config: admission.cache.seedPath).
- On startup, load seed file and emit metric zastava.cache_seed_entries_total.
- Evict entries on TTL or when policyRevision annotation in AdmissionReview changes (requires hook from Core DTO).
Fail-open/closed toggles
- Configuration: global default + namespace overrides through admission.failOpenNamespaces, admission.failClosedNamespaces.
- Decision matrix:
  - Backend success + verdict PASS → allow.
  - Backend success + non-pass → deny unless namespace override says warn allowed.
  - Backend failure → allow if namespace fail-open, deny otherwise; annotate response with zastava.ops/fail-open=true.
- Implement policy change event hook (future) to clear cache if observer signals revocation.
Metrics & logging
- Counters: zastava.admission_requests_total{decision}, zastava.cache_hits_total{result=hit|miss}, zastava.fail_open_total, zastava.backend_failures_total{stage}.
- Histograms: zastava.admission_latency_seconds (overall), zastava.resolve_latency_seconds.
- Logs: structured JSON with decision, namespace, pod, imageDigest, reasons, cacheStatus, failMode.
- Optionally emit OpenTelemetry span for admission path with attributes capturing backend latency + cache path.
Testing & ops hooks
- Unit tests for cache TTL, namespace override logic, fail-open metric increments.
- Integration test simulating backend outage ensuring fail-open/closed behaviour matches config.
- Document runbook snippet describing interpreting metrics and toggling namespaces.

Open Questions

Confirm whether cache entries should include policyRevision to detect backend policy updates; requires coordination with Policy guild.
Need guidance on maximum cache size (default suggestions: 5k entries per replica?) to avoid memory blow-up.

7.3 KiB Raw Blame History

Zastava Webhook · Wave 0 Implementation Notes

ZASTAVA-WEBHOOK-12-101 — Admission Controller Host (TLS bootstrap + Authority auth)

ZASTAVA-WEBHOOK-12-102 — Backend policy query & digest resolution

ZASTAVA-WEBHOOK-12-103 — Caching, fail-open/closed toggles, metrics/logging

7.3 KiB

Raw Blame History