Files
git.stella-ops.org/docs/ARCHITECTURE_ZASTAVA.md

452 lines
17 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# component_architecture_zastava.md — **StellaOps Zastava** (2025Q4)
> **Scope.** Implementationready architecture for **Zastava**: the **runtime inspector/enforcer** that watches real workloads, detects drift from the scanned baseline, verifies image/SBOM/attestation posture, and (optionally) **admits/blocks** deployments. Includes Kubernetes & plainDocker topologies, data contracts, APIs, security posture, performance targets, test matrices, and failure modes.
---
## 0) Mission & boundaries
**Mission.** Give operators **groundtruth** from running environments and a **fast guardrail** before workloads land:
* **Observer:** inventory containers, entrypoints actually executed, and DSOs actually loaded; verify **image signature**, **SBOM referrers**, and **attestation** presence; detect **drift** (unexpected processes/paths) and **policy violations**; publish **runtime events** to Scanner.WebService.
* **Admission (optional):** Kubernetes ValidatingAdmissionWebhook that enforces minimal posture (signed images, SBOM availability, known base images, policy PASS) **preflight**.
**Boundaries.**
* Zastava **does not** compute SBOMs and does not sign; it **consumes** Scanner/WebService outputs and **enforces** backend policy verdicts.
* Zastava can **request** a delta scan when the baseline is missing/stale, but scanning is done by **Scanner.Worker**.
* On nonK8s Docker hosts, Zastava runs as a host service with **observeronly** features.
---
## 1) Topology & processes
### 1.1 Components (Kubernetes)
```
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
```
### 1.2 Components (Docker/VM)
```
stellaops/zastava-agent # System service; watch Docker events; observer only
```
### 1.3 Dependencies
* **Authority** (OIDC): short OpToks (DPoP/mTLS) for API calls to Scanner.WebService.
* **Scanner.WebService**: `/runtime/events` ingestion; `/policy/runtime` fetch.
* **OCI Registry** (optional): for direct referrers/sig checks if not delegated to backend.
* **Container runtime**: containerd/CRIO/Docker (read interfaces only).
* **Kubernetes API** (watch Pods in cluster; validating webhook).
* **Host mounts** (K8s DaemonSet): `/proc`, `/var/lib/containerd` (or CRIO), `/run/containerd/containerd.sock` (optional, readonly).
---
## 2) Data contracts
### 2.1 Runtime event (observer → Scanner.WebService)
```json
{
"eventId": "9f6a…",
"when": "2025-10-17T12:34:56Z",
"kind": "CONTAINER_START|CONTAINER_STOP|DRIFT|POLICY_VIOLATION|ATTESTATION_STATUS",
"tenant": "tenant-01",
"node": "ip-10-0-1-23",
"runtime": { "engine": "containerd", "version": "1.7.19" },
"workload": {
"platform": "kubernetes",
"namespace": "payments",
"pod": "api-7c9fbbd8b7-ktd84",
"container": "api",
"containerId": "containerd://...",
"imageRef": "ghcr.io/acme/api@sha256:abcd…",
"owner": { "kind": "Deployment", "name": "api" }
},
"process": {
"pid": 12345,
"entrypoint": ["/entrypoint.sh", "--serve"],
"entryTrace": [
{"file":"/entrypoint.sh","line":3,"op":"exec","target":"/usr/bin/python3"},
{"file":"<argv>","op":"python","target":"/opt/app/server.py"}
]
},
"loadedLibs": [
{ "path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "…"},
{ "path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "…"}
],
"posture": {
"imageSigned": true,
"sbomReferrer": "present|missing",
"attestation": { "uuid": "rekor-uuid", "verified": true }
},
"delta": {
"baselineImageDigest": "sha256:abcd…",
"changedFiles": ["/opt/app/server.py"], // optional quick signal
"newBinaries": [{ "path":"/usr/local/bin/helper","sha256":"…" }]
},
"evidence": [
{"signal":"procfs.maps","value":"/lib/.../libssl.so.3@0x7f..."},
{"signal":"cri.task.inspect","value":"pid=12345"},
{"signal":"registry.referrers","value":"sbom: application/vnd.cyclonedx+json"}
]
}
```
### 2.2 Admission decision (webhook → API server)
```json
{
"admissionId": "…",
"namespace": "payments",
"podSpecDigest": "sha256:…",
"images": [
{
"name": "ghcr.io/acme/api:1.2.3",
"resolved": "ghcr.io/acme/api@sha256:abcd…",
"signed": true,
"hasSbomReferrers": true,
"policyVerdict": "pass|warn|fail",
"reasons": ["unsigned base image", "missing SBOM"]
}
],
"decision": "Allow|Deny",
"ttlSeconds": 300
}
```
---
## 3) Observer — node agent (DaemonSet)
### 3.1 Responsibilities
* **Watch** container lifecycle (start/stop) via CRI (`/run/containerd/containerd.sock` gRPC readonly) or `/var/log/containers/*.log` tail fallback.
* **Resolve** container → image digest, mount point rootfs.
* **Trace entrypoint**: attach **shortlived** nsenter/exec to PID 1 in container, parse shell for `exec` chain (bounded depth), record **terminal program**.
* **Sample loaded libs**: read `/proc/<pid>/maps` and `exe` symlink to collect **actually loaded** DSOs; compute **sha256** for each mapped file (bounded count/size).
* **Posture check** (cheap):
* Image signature presence (if cosign policies are local; else ask backend).
* SBOM **referrers** presence (HEAD to registry, optional).
* Rekor UUID known (query Scanner.WebService by image digest).
* **Publish runtime events** to Scanner.WebService `/runtime/events` (batch & compress).
* **Request delta scan** if: no SBOM in catalog OR base differs from known baseline.
### 3.2 Privileges & mounts (K8s)
* **SecurityContext:** `runAsUser: 0`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`.
* **Capabilities:** `CAP_SYS_PTRACE` (optional if using nsenter trace), `CAP_DAC_READ_SEARCH`.
* **Host mounts (readonly):**
* `/proc` (host) → `/host/proc`
* `/run/containerd/containerd.sock` (or CRIO socket)
* `/var/lib/containerd/io.containerd.runtime.v2.task` (rootfs paths & pids)
* **Networking:** clusterinternal egress to Scanner.WebService only.
* **Rate limits:** hard caps for bytes hashed and file count per container to avoid noisy tenants.
### 3.3 Event batching
* Buffer NDJSON; flush by **N events** or **2s**.
* Backpressure: local disk ring buffer (50MB default) if Scanner is temporarily unavailable; drop oldest after cap with **metrics** and **warning** event.
---
## 4) Admission Webhook (Kubernetes)
### 4.1 Gate criteria
Configurable policy (fetched from backend and cached):
* **Image signature**: must be cosignverifiable to configured key(s) or keyless identities.
* **SBOM availability**: at least one **CycloneDX** referrer or **Scanner.WebService** catalog entry.
* **Scanner policy verdict**: backend `PASS` required for namespaces/labels matching rules; allow `WARN` if configured.
* **Registry allowlists/denylists**.
* **Tag bans** (e.g., `:latest`).
* **Base image allowlists** (by digest).
### 4.2 Flow
```mermaid
sequenceDiagram
autonumber
participant K8s as API Server
participant WH as Zastava Webhook
participant SW as Scanner.WebService
K8s->>WH: AdmissionReview(Pod)
WH->>WH: Resolve images to digests (remote HEAD/pull if needed)
WH->>SW: POST /policy/runtime { digests, namespace, labels }
SW-->>WH: { per-image: {signed, hasSbom, verdict, reasons}, ttl }
alt All pass
WH-->>K8s: AdmissionResponse(Allow, ttl)
else Any fail (enforce=true)
WH-->>K8s: AdmissionResponse(Deny, message)
end
```
**Caching:** Perdigest result cached `ttlSeconds` (default 300 s). **Failopen** or **failclosed** is configurable per namespace.
### 4.3 TLS & HA
* Webhook has its own **serving cert** signed by cluster CA (or custom cert + CA bundle on configuration).
* Deployment ≥ 2 replicas; **leaderless**; stateless.
---
## 5) Backend integration (Scanner.WebService)
### 5.1 Ingestion endpoint
`POST /api/v1/scanner/runtime/events` *(OpTok + DPoP/mTLS)*
* Validates event schema; enforces rate caps by tenant/node; persists to **Mongo** (`runtime.events` capped collection or regular with TTL).
* Performs **correlation**:
* Attach nearest **image SBOM** (inventory/usage) and **BOMIndex** if known.
* If unknown/missing, schedule **delta scan** and return `202 Accepted`.
* Emits **derived signals** (usedByEntrypoint per component based on `/proc/<pid>/maps`).
### 5.2 Policy decision API (for webhook)
`POST /api/v1/scanner/policy/runtime`
Request:
```json
{
"namespace": "payments",
"labels": { "app": "api", "env": "prod" },
"images": ["ghcr.io/acme/api@sha256:...", "ghcr.io/acme/nginx@sha256:..."]
}
```
Response:
```json
{
"ttlSeconds": 300,
"results": {
"ghcr.io/acme/api@sha256:...": {
"signed": true,
"hasSbom": true,
"policyVerdict": "pass",
"reasons": [],
"rekor": { "uuid": "..." }
},
"ghcr.io/acme/nginx@sha256:...": {
"signed": false,
"hasSbom": false,
"policyVerdict": "fail",
"reasons": ["unsigned", "missing SBOM"]
}
}
}
```
---
## 6) Configuration (YAML)
```yaml
zastava:
mode:
observer: true
webhook: true
authority:
issuer: "https://authority.internal"
aud: ["scanner","zastava"] # tokens for backend and self-id
backend:
url: "https://scanner-web.internal"
connectTimeoutMs: 500
requestTimeoutMs: 1500
retry: { attempts: 3, backoffMs: 200 }
runtime:
engine: "auto" # containerd|cri-o|docker|auto
procfs: "/host/proc"
collect:
entryTrace: true
loadedLibs: true
maxLibs: 256
maxHashBytesPerContainer: 64_000_000
maxDepth: 48
admission:
enforce: true
failOpenNamespaces: ["dev", "test"]
verify:
imageSignature: true
sbomReferrer: true
scannerPolicyPass: true
cacheTtlSeconds: 300
resolveTags: true # do remote digest resolution for tag-only images
limits:
eventsPerSecond: 50
burst: 200
perNodeQueue: 10_000
security:
mounts:
containerdSock: "/run/containerd/containerd.sock:ro"
proc: "/proc:/host/proc:ro"
runtimeState: "/var/lib/containerd:ro"
```
---
## 7) Security posture
* **AuthN/Z**: Authority OpToks (DPoP preferred) to backend; webhook does **not** require client auth from API server (K8s handles).
* **Least privileges**: readonly host mounts; optional `CAP_SYS_PTRACE`; **no** host networking; **no** write mounts.
* **Isolation**: never exec untrusted code; nsenter only to **read** `/proc/<pid>`.
* **Data minimization**: do not exfiltrate env vars or command arguments unless policy explicitly enables diagnostic mode.
* **Rate limiting**: pernode caps; pertenant caps at backend.
* **Hard caps**: bytes hashed, files inspected, depth of shell parsing.
---
## 8) Metrics, logs, tracing
**Observer**
* `zastava.events_emitted_total{kind}`
* `zastava.proc_maps_samples_total{result}`
* `zastava.entrytrace_depth{p99}`
* `zastava.hash_bytes_total`
* `zastava.buffer_drops_total`
**Webhook**
* `zastava.admission_requests_total{decision}`
* `zastava.admission_latency_seconds`
* `zastava.cache_hits_total`
* `zastava.backend_failures_total`
**Logs** (structured): node, pod, image digest, decision, reasons.
**Tracing**: spans for observe→batch→post; webhook request→resolve→respond.
---
## 9) Performance & scale targets
* **Observer**: ≤ **30ms** to sample `/proc/<pid>/maps` and compute quick hashes for ≤ 64 files; ≤ **200ms** for full library set (256 libs).
* **Webhook**: P95 ≤ **8ms** with warm cache; ≤ **50ms** with one backend roundtrip.
* **Throughput**: 1k admission requests/min/replica; 5k runtime events/min/node with batching.
---
## 10) Drift detection model
**Signals**
* **Process drift**: terminal program differs from **EntryTrace** baseline.
* **Library drift**: loaded DSOs not present in **Usage** SBOM view.
* **Filesystem drift**: new executable files under `/usr/local/bin`, `/opt`, `/app` with **mtime** after image creation.
* **Network drift** (optional): listening sockets on unexpected ports (from policy).
**Action**
* Emit `DRIFT` event with evidence; backend can **autoqueue** a delta scan; policy may **escalate** to alert/block (Admission cannot block alreadyrunning pods; rely on K8s policies/PodSecurity or operator action).
---
## 11) Test matrix
* **Engines**: containerd, CRIO, Docker; ensure PID resolution and rootfs mapping.
* **EntryTrace**: bash features (case, if, runparts, `.`/`source`), language launchers (python/node/java).
* **Procfs**: multiple arches, musl/glibc images; static binaries (maps minimal).
* **Admission**: unsigned images, missing SBOM referrers, tagonly images, digest resolution, backend latency, cache TTL.
* **Perf/soak**: 500 Pods/node churn; webhook under HPA growth.
* **Security**: attempt privilege escalation disabled, readonly mounts enforced, ratelimit abuse.
* **Failure injection**: backend down (observer buffers, webhook failopen/closed), registry throttling, containerd socket unavailable.
---
## 12) Failure modes & responses
| Condition | Observer behavior | Webhook behavior |
| ------------------------------- | ---------------------------------------------- | ------------------------------------------------------ |
| Backend unreachable | Buffer to disk; drop after cap; emit metric | **Failopen/closed** per namespace config |
| PID vanished midsample | Retry once; emit partial evidence | N/A |
| CRI socket missing | Fallback to K8s events only (reduced fidelity) | N/A |
| Registry digest resolve blocked | Defer to backend; mark `resolve=unknown` | Deny or allow per `resolveTags` & `failOpenNamespaces` |
| Excessive events | Apply local rate limit, coalesce | N/A |
---
## 13) Deployment notes (K8s)
**DaemonSet (snippet):**
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata: { name: zastava-observer, namespace: stellaops }
spec:
template:
spec:
serviceAccountName: zastava
hostPID: true
containers:
- name: observer
image: stellaops/zastava-observer:2.3
securityContext:
runAsUser: 0
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities: { add: ["SYS_PTRACE","DAC_READ_SEARCH"] }
volumeMounts:
- { name: proc, mountPath: /host/proc, readOnly: true }
- { name: containerd-sock, mountPath: /run/containerd/containerd.sock, readOnly: true }
- { name: containerd-state, mountPath: /var/lib/containerd, readOnly: true }
volumes:
- { name: proc, hostPath: { path: /proc } }
- { name: containerd-sock, hostPath: { path: /run/containerd/containerd.sock } }
- { name: containerd-state, hostPath: { path: /var/lib/containerd } }
```
**Webhook (snippet):**
```yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
webhooks:
- name: gate.zastava.stella-ops.org
admissionReviewVersions: ["v1"]
sideEffects: None
failurePolicy: Ignore # or Fail
rules:
- operations: ["CREATE","UPDATE"]
apiGroups: [""]
apiVersions: ["v1"]
resources: ["pods"]
clientConfig:
service:
namespace: stellaops
name: zastava-webhook
path: /admit
caBundle: <base64 CA>
```
---
## 14) Implementation notes
* **Language**: Rust (observer) for lowlatency `/proc` parsing; Go/.NET viable too. Webhook can be .NET 10 for parity with backend.
* **CRI drivers**: pluggable (`containerd`, `cri-o`, `docker`). Prefer CRI over parsing logs.
* **Shell parser**: reuse Scanner.EntryTrace grammar for consistent results (compile to WASM if observer is Rust/Go).
* **Hashing**: `BLAKE3` for speed locally, then convert to `sha256` (or compute `sha256` directly when budget allows).
* **Resilience**: never block container start; observer is **passive**; only webhook decides allow/deny.
---
## 15) Roadmap
* **eBPF** option for syscall/library load tracing (kernellevel, optin).
* **Windows containers** support (ETW providers, loaded modules).
* **Network posture** checks: listening ports vs policy.
* **Live **usedbyentrypoint** synthesis**: send compact bitset diff to backend to tighten Usage view.
* **Admission dryrun** dashboards (simulate block lists before enforcing).