# Runtime Posture and Observation with Zastava **Version:** 1.0 **Date:** 2025-11-29 **Status:** Canonical This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification. --- ## 1. Executive Summary Zastava is the **runtime inspector and enforcer** that provides ground-truth from running environments. Key capabilities: - **Runtime Observation** - Inventory containers, track entrypoints, monitor loaded DSOs - **Admission Control** - Kubernetes ValidatingAdmissionWebhook for pre-flight gates - **Drift Detection** - Identify unexpected processes, libraries, and file changes - **Posture Verification** - Validate signatures, SBOM referrers, attestations - **Build-ID Tracking** - Correlate binaries to debug symbols and source --- ## 2. Market Drivers ### 2.1 Target Segments | Segment | Runtime Requirements | Use Case | |---------|---------------------|----------| | **Enterprise Security** | Runtime visibility | Post-deploy monitoring | | **Platform Engineering** | Admission gates | Policy enforcement | | **Compliance Teams** | Continuous verification | Runtime attestation | | **DevSecOps** | Drift detection | Configuration management | ### 2.2 Competitive Positioning Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with: - **Runtime ground-truth** from actual container execution - **DSO tracking** - which libraries are actually loaded - **Entrypoint tracing** - what programs actually run - **Native Kubernetes admission** with policy integration - **Build-ID correlation** for symbol resolution --- ## 3. Architecture Overview ### 3.1 Component Topology **Kubernetes Deployment:** ``` stellaops/zastava-observer # DaemonSet on every node (read-only host mounts) stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas) ``` **Docker/VM Deployment:** ``` stellaops/zastava-agent # System service; watch Docker events; observer only ``` ### 3.2 Dependencies | Dependency | Purpose | |------------|---------| | Authority | OpToks (DPoP/mTLS) for API calls | | Scanner.WebService | Event ingestion, policy decisions | | OCI Registry | Referrer/signature checks | | Container Runtime | containerd/CRI-O/Docker interfaces | | Kubernetes API | Pod watching, admission webhook | --- ## 4. Runtime Event Model ### 4.1 Event Types | Kind | Trigger | Payload | |------|---------|---------| | `CONTAINER_START` | Container lifecycle | Image, entrypoint, namespace | | `CONTAINER_STOP` | Container termination | Exit code, duration | | `DRIFT` | Unexpected change | Changed files, new binaries | | `POLICY_VIOLATION` | Rule breach | Reason, severity | | `ATTESTATION_STATUS` | Verification result | Signed, SBOM present | ### 4.2 Event Envelope ```json { "eventId": "uuid", "when": "2025-11-29T12:00:00Z", "kind": "CONTAINER_START", "tenant": "acme-corp", "node": "worker-node-01", "runtime": { "engine": "containerd", "version": "1.7.19" }, "workload": { "platform": "kubernetes", "namespace": "production", "pod": "api-7c9fbbd8b7-ktd84", "container": "api", "containerId": "containerd://abc123...", "imageRef": "ghcr.io/acme/api@sha256:def456...", "owner": { "kind": "Deployment", "name": "api" } }, "process": { "pid": 12345, "entrypoint": ["/entrypoint.sh", "--serve"], "entryTrace": [ {"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"}, {"file": "", "op": "python", "target": "/opt/app/server.py"} ], "buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1" }, "loadedLibs": [ {"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."}, {"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."} ], "posture": { "imageSigned": true, "sbomReferrer": "present", "attestation": { "uuid": "rekor-uuid", "verified": true } } } ``` --- ## 5. Observer Capabilities ### 5.1 Container Lifecycle Tracking - Watch container start/stop via CRI socket - Resolve container to image digest - Map mount points and rootfs paths - Track container metadata (labels, annotations) ### 5.2 Entrypoint Tracing - Attach short-lived nsenter to container PID 1 - Parse shell scripts for exec chain - Record terminal program (actual binary) - Bounded depth to prevent infinite loops ### 5.3 Loaded Library Sampling - Read `/proc//maps` for loaded DSOs - Compute SHA-256 for each mapped file - Track GNU build-IDs for symbol correlation - Rate limits prevent resource exhaustion ### 5.4 Posture Verification - Image signature presence (cosign policies) - SBOM referrers check (registry HEAD) - Rekor attestation lookup via Scanner.WebService - Policy verdict from backend --- ## 6. Admission Control ### 6.1 Gate Criteria | Criterion | Description | Configurable | |-----------|-------------|--------------| | Image Signature | Cosign-verifiable to configured keys | Yes | | SBOM Availability | CycloneDX referrer or catalog entry | Yes | | Policy Verdict | Backend PASS required | Yes | | Registry Allowlist | Permitted registries | Yes | | Tag Bans | Reject `:latest`, etc. | Yes | | Base Image Allowlist | Permitted base digests | Yes | ### 6.2 Decision Flow ```mermaid sequenceDiagram participant K8s as API Server participant WH as Zastava Webhook participant SW as Scanner.WebService K8s->>WH: AdmissionReview(Pod) WH->>WH: Resolve images to digests WH->>SW: POST /policy/runtime SW-->>WH: {signed, hasSbom, verdict, reasons} alt All pass WH-->>K8s: Allow else Any fail WH-->>K8s: Deny (with reasons) end ``` ### 6.3 Response Caching - Per-digest results cached for TTL (default 300s) - Fail-open or fail-closed per namespace - Cache invalidation on policy updates --- ## 7. Drift Detection ### 7.1 Signal Types | Signal | Detection Method | Action | |--------|-----------------|--------| | Process Drift | Terminal program differs from EntryTrace baseline | Alert | | Library Drift | Loaded DSOs not in Usage SBOM | Alert, delta scan | | Filesystem Drift | New executables with mtime after image creation | Alert | | Network Drift | Unexpected listening ports | Alert (optional) | ### 7.2 Drift Event ```json { "kind": "DRIFT", "delta": { "baselineImageDigest": "sha256:abc...", "changedFiles": ["/opt/app/server.py"], "newBinaries": [ {"path": "/usr/local/bin/helper", "sha256": "..."} ] }, "evidence": [ {"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."}, {"signal": "cri.task.inspect", "value": "pid=12345"} ] } ``` --- ## 8. Build-ID Workflow ### 8.1 Capture 1. Observer extracts `NT_GNU_BUILD_ID` from `/proc//exe` 2. Normalize to lower-case hex 3. Include in runtime event as `process.buildId` ### 8.2 Correlation 1. Scanner.WebService persists observation 2. Policy responses include `buildIds` list 3. Debug files matched via `.build-id//.debug` ### 8.3 Symbol Resolution ```bash # Via CLI stella runtime policy test --image sha256:abc123... | jq '.buildIds' # Via debuginfod debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1 ``` --- ## 9. Implementation Strategy ### 9.1 Phase 1: Observer Core (Complete) - [x] CRI socket integration - [x] Container lifecycle tracking - [x] Entrypoint tracing - [x] Loaded library sampling - [x] Event batching and compression ### 9.2 Phase 2: Admission Webhook (Complete) - [x] ValidatingAdmissionWebhook - [x] Image digest resolution - [x] Policy integration - [x] Response caching - [x] Fail-open/closed modes ### 9.3 Phase 3: Drift Detection (In Progress) - [x] Process drift detection - [x] Library drift detection - [ ] Filesystem drift monitoring (ZASTAVA-DRIFT-50-001) - [ ] Network posture checks (ZASTAVA-NET-51-001) ### 9.4 Phase 4: Advanced Features (Planned) - [ ] eBPF syscall tracing (optional) - [ ] Windows container support - [ ] Live used-by-entrypoint synthesis - [ ] Admission dry-run dashboards --- ## 10. Configuration ```yaml zastava: mode: observer: true webhook: true backend: baseAddress: "https://scanner-web.internal" policyPath: "/api/v1/scanner/policy/runtime" requestTimeoutSeconds: 5 runtime: authority: issuer: "https://authority.internal" clientId: "zastava-observer" audience: ["scanner", "zastava"] scopes: ["api:scanner.runtime.write"] requireDpop: true requireMutualTls: true tenant: "acme-corp" engine: "auto" # containerd|cri-o|docker|auto procfs: "/host/proc" collect: entryTrace: true loadedLibs: true maxLibs: 256 maxHashBytesPerContainer: 64000000 admission: enforce: true failOpenNamespaces: ["dev", "test"] verify: imageSignature: true sbomReferrer: true scannerPolicyPass: true cacheTtlSeconds: 300 limits: eventsPerSecond: 50 burst: 200 perNodeQueue: 10000 ``` --- ## 11. Security Posture ### 11.1 Privileges | Capability | Purpose | Mode | |------------|---------|------| | `CAP_SYS_PTRACE` | nsenter trace | Optional | | `CAP_DAC_READ_SEARCH` | Read /proc | Required | | Host PID namespace | Container PIDs | Required | | Read-only mounts | /proc, sockets | Required | ### 11.2 Least Privilege - No write mounts - No host networking - No privilege escalation - Read-only rootfs ### 11.3 Data Minimization - No env var exfiltration - No command argument logging (unless diagnostic mode) - Rate limits prevent abuse --- ## 12. Observability ### 12.1 Observer Metrics - `zastava.runtime.events.total{kind}` - `zastava.runtime.backend.latency.ms{endpoint}` - `zastava.proc_maps.samples.total{result}` - `zastava.entrytrace.depth{p99}` - `zastava.hash.bytes.total` - `zastava.buffer.drops.total` ### 12.2 Webhook Metrics - `zastava.admission.decisions.total{decision}` - `zastava.admission.cache.hits.total` - `zastava.backend.failures.total` --- ## 13. Performance Targets | Operation | Target | |-----------|--------| | `/proc//maps` sampling | < 30ms (64 files) | | Full library hash set | < 200ms (256 libs) | | Admission with warm cache | < 8ms p95 | | Admission with backend call | < 50ms p95 | | Event throughput | 5k events/min/node | --- ## 14. Related Documentation | Resource | Location | |----------|----------| | Zastava architecture | `docs/modules/zastava/architecture.md` | | Runtime event schema | `docs/modules/zastava/event-schema.md` | | Admission configuration | `docs/modules/zastava/admission-config.md` | | Deployment guide | `docs/modules/zastava/deployment.md` | --- ## 15. Sprint Mapping - **Primary Sprint:** SPRINT_0144_0001_0001_zastava_runtime_signals.md - **Related Sprints:** - SPRINT_0140_0001_0001_runtime_signals.md - SPRINT_0143_0000_0001_signals.md **Key Task IDs:** - `ZASTAVA-OBS-40-001` - Observer core (DONE) - `ZASTAVA-ADM-41-001` - Admission webhook (DONE) - `ZASTAVA-DRIFT-50-001` - Filesystem drift (IN PROGRESS) - `ZASTAVA-NET-51-001` - Network posture (TODO) - `ZASTAVA-EBPF-60-001` - eBPF integration (FUTURE) --- ## 16. Success Metrics | Metric | Target | |--------|--------| | Event capture rate | 99.9% of container starts | | Admission latency | < 50ms p95 | | Drift detection rate | 100% of runtime changes | | False positive rate | < 1% of drift alerts | | Node resource usage | < 2% CPU, < 100MB RAM | --- *Last updated: 2025-11-29*