11 KiB
11 KiB
Runtime Posture and Observation with Zastava
Version: 1.0 Date: 2025-11-29 Status: Canonical
This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification.
1. Executive Summary
Zastava is the runtime inspector and enforcer that provides ground-truth from running environments. Key capabilities:
- Runtime Observation - Inventory containers, track entrypoints, monitor loaded DSOs
- Admission Control - Kubernetes ValidatingAdmissionWebhook for pre-flight gates
- Drift Detection - Identify unexpected processes, libraries, and file changes
- Posture Verification - Validate signatures, SBOM referrers, attestations
- Build-ID Tracking - Correlate binaries to debug symbols and source
2. Market Drivers
2.1 Target Segments
| Segment | Runtime Requirements | Use Case |
|---|---|---|
| Enterprise Security | Runtime visibility | Post-deploy monitoring |
| Platform Engineering | Admission gates | Policy enforcement |
| Compliance Teams | Continuous verification | Runtime attestation |
| DevSecOps | Drift detection | Configuration management |
2.2 Competitive Positioning
Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with:
- Runtime ground-truth from actual container execution
- DSO tracking - which libraries are actually loaded
- Entrypoint tracing - what programs actually run
- Native Kubernetes admission with policy integration
- Build-ID correlation for symbol resolution
3. Architecture Overview
3.1 Component Topology
Kubernetes Deployment:
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
Docker/VM Deployment:
stellaops/zastava-agent # System service; watch Docker events; observer only
3.2 Dependencies
| Dependency | Purpose |
|---|---|
| Authority | OpToks (DPoP/mTLS) for API calls |
| Scanner.WebService | Event ingestion, policy decisions |
| OCI Registry | Referrer/signature checks |
| Container Runtime | containerd/CRI-O/Docker interfaces |
| Kubernetes API | Pod watching, admission webhook |
4. Runtime Event Model
4.1 Event Types
| Kind | Trigger | Payload |
|---|---|---|
CONTAINER_START |
Container lifecycle | Image, entrypoint, namespace |
CONTAINER_STOP |
Container termination | Exit code, duration |
DRIFT |
Unexpected change | Changed files, new binaries |
POLICY_VIOLATION |
Rule breach | Reason, severity |
ATTESTATION_STATUS |
Verification result | Signed, SBOM present |
4.2 Event Envelope
{
"eventId": "uuid",
"when": "2025-11-29T12:00:00Z",
"kind": "CONTAINER_START",
"tenant": "acme-corp",
"node": "worker-node-01",
"runtime": {
"engine": "containerd",
"version": "1.7.19"
},
"workload": {
"platform": "kubernetes",
"namespace": "production",
"pod": "api-7c9fbbd8b7-ktd84",
"container": "api",
"containerId": "containerd://abc123...",
"imageRef": "ghcr.io/acme/api@sha256:def456...",
"owner": {
"kind": "Deployment",
"name": "api"
}
},
"process": {
"pid": 12345,
"entrypoint": ["/entrypoint.sh", "--serve"],
"entryTrace": [
{"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"},
{"file": "<argv>", "op": "python", "target": "/opt/app/server.py"}
],
"buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1"
},
"loadedLibs": [
{"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."},
{"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."}
],
"posture": {
"imageSigned": true,
"sbomReferrer": "present",
"attestation": {
"uuid": "rekor-uuid",
"verified": true
}
}
}
5. Observer Capabilities
5.1 Container Lifecycle Tracking
- Watch container start/stop via CRI socket
- Resolve container to image digest
- Map mount points and rootfs paths
- Track container metadata (labels, annotations)
5.2 Entrypoint Tracing
- Attach short-lived nsenter to container PID 1
- Parse shell scripts for exec chain
- Record terminal program (actual binary)
- Bounded depth to prevent infinite loops
5.3 Loaded Library Sampling
- Read
/proc/<pid>/mapsfor loaded DSOs - Compute SHA-256 for each mapped file
- Track GNU build-IDs for symbol correlation
- Rate limits prevent resource exhaustion
5.4 Posture Verification
- Image signature presence (cosign policies)
- SBOM referrers check (registry HEAD)
- Rekor attestation lookup via Scanner.WebService
- Policy verdict from backend
6. Admission Control
6.1 Gate Criteria
| Criterion | Description | Configurable |
|---|---|---|
| Image Signature | Cosign-verifiable to configured keys | Yes |
| SBOM Availability | CycloneDX referrer or catalog entry | Yes |
| Policy Verdict | Backend PASS required | Yes |
| Registry Allowlist | Permitted registries | Yes |
| Tag Bans | Reject :latest, etc. |
Yes |
| Base Image Allowlist | Permitted base digests | Yes |
6.2 Decision Flow
sequenceDiagram
participant K8s as API Server
participant WH as Zastava Webhook
participant SW as Scanner.WebService
K8s->>WH: AdmissionReview(Pod)
WH->>WH: Resolve images to digests
WH->>SW: POST /policy/runtime
SW-->>WH: {signed, hasSbom, verdict, reasons}
alt All pass
WH-->>K8s: Allow
else Any fail
WH-->>K8s: Deny (with reasons)
end
6.3 Response Caching
- Per-digest results cached for TTL (default 300s)
- Fail-open or fail-closed per namespace
- Cache invalidation on policy updates
7. Drift Detection
7.1 Signal Types
| Signal | Detection Method | Action |
|---|---|---|
| Process Drift | Terminal program differs from EntryTrace baseline | Alert |
| Library Drift | Loaded DSOs not in Usage SBOM | Alert, delta scan |
| Filesystem Drift | New executables with mtime after image creation | Alert |
| Network Drift | Unexpected listening ports | Alert (optional) |
7.2 Drift Event
{
"kind": "DRIFT",
"delta": {
"baselineImageDigest": "sha256:abc...",
"changedFiles": ["/opt/app/server.py"],
"newBinaries": [
{"path": "/usr/local/bin/helper", "sha256": "..."}
]
},
"evidence": [
{"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."},
{"signal": "cri.task.inspect", "value": "pid=12345"}
]
}
8. Build-ID Workflow
8.1 Capture
- Observer extracts
NT_GNU_BUILD_IDfrom/proc/<pid>/exe - Normalize to lower-case hex
- Include in runtime event as
process.buildId
8.2 Correlation
- Scanner.WebService persists observation
- Policy responses include
buildIdslist - Debug files matched via
.build-id/<aa>/<rest>.debug
8.3 Symbol Resolution
# Via CLI
stella runtime policy test --image sha256:abc123... | jq '.buildIds'
# Via debuginfod
debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1
9. Implementation Strategy
9.1 Phase 1: Observer Core (Complete)
- CRI socket integration
- Container lifecycle tracking
- Entrypoint tracing
- Loaded library sampling
- Event batching and compression
9.2 Phase 2: Admission Webhook (Complete)
- ValidatingAdmissionWebhook
- Image digest resolution
- Policy integration
- Response caching
- Fail-open/closed modes
9.3 Phase 3: Drift Detection (In Progress)
- Process drift detection
- Library drift detection
- Filesystem drift monitoring (ZASTAVA-DRIFT-50-001)
- Network posture checks (ZASTAVA-NET-51-001)
9.4 Phase 4: Advanced Features (Planned)
- eBPF syscall tracing (optional)
- Windows container support
- Live used-by-entrypoint synthesis
- Admission dry-run dashboards
10. Configuration
zastava:
mode:
observer: true
webhook: true
backend:
baseAddress: "https://scanner-web.internal"
policyPath: "/api/v1/scanner/policy/runtime"
requestTimeoutSeconds: 5
runtime:
authority:
issuer: "https://authority.internal"
clientId: "zastava-observer"
audience: ["scanner", "zastava"]
scopes: ["api:scanner.runtime.write"]
requireDpop: true
requireMutualTls: true
tenant: "acme-corp"
engine: "auto" # containerd|cri-o|docker|auto
procfs: "/host/proc"
collect:
entryTrace: true
loadedLibs: true
maxLibs: 256
maxHashBytesPerContainer: 64000000
admission:
enforce: true
failOpenNamespaces: ["dev", "test"]
verify:
imageSignature: true
sbomReferrer: true
scannerPolicyPass: true
cacheTtlSeconds: 300
limits:
eventsPerSecond: 50
burst: 200
perNodeQueue: 10000
11. Security Posture
11.1 Privileges
| Capability | Purpose | Mode |
|---|---|---|
CAP_SYS_PTRACE |
nsenter trace | Optional |
CAP_DAC_READ_SEARCH |
Read /proc | Required |
| Host PID namespace | Container PIDs | Required |
| Read-only mounts | /proc, sockets | Required |
11.2 Least Privilege
- No write mounts
- No host networking
- No privilege escalation
- Read-only rootfs
11.3 Data Minimization
- No env var exfiltration
- No command argument logging (unless diagnostic mode)
- Rate limits prevent abuse
12. Observability
12.1 Observer Metrics
zastava.runtime.events.total{kind}zastava.runtime.backend.latency.ms{endpoint}zastava.proc_maps.samples.total{result}zastava.entrytrace.depth{p99}zastava.hash.bytes.totalzastava.buffer.drops.total
12.2 Webhook Metrics
zastava.admission.decisions.total{decision}zastava.admission.cache.hits.totalzastava.backend.failures.total
13. Performance Targets
| Operation | Target |
|---|---|
/proc/<pid>/maps sampling |
< 30ms (64 files) |
| Full library hash set | < 200ms (256 libs) |
| Admission with warm cache | < 8ms p95 |
| Admission with backend call | < 50ms p95 |
| Event throughput | 5k events/min/node |
14. Related Documentation
| Resource | Location |
|---|---|
| Zastava architecture | docs/modules/zastava/architecture.md |
| Runtime event schema | docs/modules/zastava/event-schema.md |
| Admission configuration | docs/modules/zastava/admission-config.md |
| Deployment guide | docs/modules/zastava/deployment.md |
15. Sprint Mapping
- Primary Sprint: SPRINT_0144_0001_0001_zastava_runtime_signals.md
- Related Sprints:
- SPRINT_0140_0001_0001_runtime_signals.md
- SPRINT_0143_0000_0001_signals.md
Key Task IDs:
ZASTAVA-OBS-40-001- Observer core (DONE)ZASTAVA-ADM-41-001- Admission webhook (DONE)ZASTAVA-DRIFT-50-001- Filesystem drift (IN PROGRESS)ZASTAVA-NET-51-001- Network posture (TODO)ZASTAVA-EBPF-60-001- eBPF integration (FUTURE)
16. Success Metrics
| Metric | Target |
|---|---|
| Event capture rate | 99.9% of container starts |
| Admission latency | < 50ms p95 |
| Drift detection rate | 100% of runtime changes |
| False positive rate | < 1% of drift alerts |
| Node resource usage | < 2% CPU, < 100MB RAM |
Last updated: 2025-11-29