Files
git.stella-ops.org/docs/product-advisories/28-Nov-2025 - Runtime Posture and Observation with Zastava.md
StellaOps Bot 0bef705bcc
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
true the date
2025-11-30 19:23:21 +02:00

11 KiB

Runtime Posture and Observation with Zastava

Version: 1.0 Date: 2025-11-29 Status: Canonical

This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification.


1. Executive Summary

Zastava is the runtime inspector and enforcer that provides ground-truth from running environments. Key capabilities:

  • Runtime Observation - Inventory containers, track entrypoints, monitor loaded DSOs
  • Admission Control - Kubernetes ValidatingAdmissionWebhook for pre-flight gates
  • Drift Detection - Identify unexpected processes, libraries, and file changes
  • Posture Verification - Validate signatures, SBOM referrers, attestations
  • Build-ID Tracking - Correlate binaries to debug symbols and source

2. Market Drivers

2.1 Target Segments

Segment Runtime Requirements Use Case
Enterprise Security Runtime visibility Post-deploy monitoring
Platform Engineering Admission gates Policy enforcement
Compliance Teams Continuous verification Runtime attestation
DevSecOps Drift detection Configuration management

2.2 Competitive Positioning

Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with:

  • Runtime ground-truth from actual container execution
  • DSO tracking - which libraries are actually loaded
  • Entrypoint tracing - what programs actually run
  • Native Kubernetes admission with policy integration
  • Build-ID correlation for symbol resolution

3. Architecture Overview

3.1 Component Topology

Kubernetes Deployment:

stellaops/zastava-observer    # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook     # ValidatingAdmissionWebhook (Deployment, 2+ replicas)

Docker/VM Deployment:

stellaops/zastava-agent       # System service; watch Docker events; observer only

3.2 Dependencies

Dependency Purpose
Authority OpToks (DPoP/mTLS) for API calls
Scanner.WebService Event ingestion, policy decisions
OCI Registry Referrer/signature checks
Container Runtime containerd/CRI-O/Docker interfaces
Kubernetes API Pod watching, admission webhook

4. Runtime Event Model

4.1 Event Types

Kind Trigger Payload
CONTAINER_START Container lifecycle Image, entrypoint, namespace
CONTAINER_STOP Container termination Exit code, duration
DRIFT Unexpected change Changed files, new binaries
POLICY_VIOLATION Rule breach Reason, severity
ATTESTATION_STATUS Verification result Signed, SBOM present

4.2 Event Envelope

{
  "eventId": "uuid",
  "when": "2025-11-29T12:00:00Z",
  "kind": "CONTAINER_START",
  "tenant": "acme-corp",
  "node": "worker-node-01",
  "runtime": {
    "engine": "containerd",
    "version": "1.7.19"
  },
  "workload": {
    "platform": "kubernetes",
    "namespace": "production",
    "pod": "api-7c9fbbd8b7-ktd84",
    "container": "api",
    "containerId": "containerd://abc123...",
    "imageRef": "ghcr.io/acme/api@sha256:def456...",
    "owner": {
      "kind": "Deployment",
      "name": "api"
    }
  },
  "process": {
    "pid": 12345,
    "entrypoint": ["/entrypoint.sh", "--serve"],
    "entryTrace": [
      {"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"},
      {"file": "<argv>", "op": "python", "target": "/opt/app/server.py"}
    ],
    "buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1"
  },
  "loadedLibs": [
    {"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."},
    {"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."}
  ],
  "posture": {
    "imageSigned": true,
    "sbomReferrer": "present",
    "attestation": {
      "uuid": "rekor-uuid",
      "verified": true
    }
  }
}

5. Observer Capabilities

5.1 Container Lifecycle Tracking

  • Watch container start/stop via CRI socket
  • Resolve container to image digest
  • Map mount points and rootfs paths
  • Track container metadata (labels, annotations)

5.2 Entrypoint Tracing

  • Attach short-lived nsenter to container PID 1
  • Parse shell scripts for exec chain
  • Record terminal program (actual binary)
  • Bounded depth to prevent infinite loops

5.3 Loaded Library Sampling

  • Read /proc/<pid>/maps for loaded DSOs
  • Compute SHA-256 for each mapped file
  • Track GNU build-IDs for symbol correlation
  • Rate limits prevent resource exhaustion

5.4 Posture Verification

  • Image signature presence (cosign policies)
  • SBOM referrers check (registry HEAD)
  • Rekor attestation lookup via Scanner.WebService
  • Policy verdict from backend

6. Admission Control

6.1 Gate Criteria

Criterion Description Configurable
Image Signature Cosign-verifiable to configured keys Yes
SBOM Availability CycloneDX referrer or catalog entry Yes
Policy Verdict Backend PASS required Yes
Registry Allowlist Permitted registries Yes
Tag Bans Reject :latest, etc. Yes
Base Image Allowlist Permitted base digests Yes

6.2 Decision Flow

sequenceDiagram
  participant K8s as API Server
  participant WH as Zastava Webhook
  participant SW as Scanner.WebService

  K8s->>WH: AdmissionReview(Pod)
  WH->>WH: Resolve images to digests
  WH->>SW: POST /policy/runtime
  SW-->>WH: {signed, hasSbom, verdict, reasons}
  alt All pass
    WH-->>K8s: Allow
  else Any fail
    WH-->>K8s: Deny (with reasons)
  end

6.3 Response Caching

  • Per-digest results cached for TTL (default 300s)
  • Fail-open or fail-closed per namespace
  • Cache invalidation on policy updates

7. Drift Detection

7.1 Signal Types

Signal Detection Method Action
Process Drift Terminal program differs from EntryTrace baseline Alert
Library Drift Loaded DSOs not in Usage SBOM Alert, delta scan
Filesystem Drift New executables with mtime after image creation Alert
Network Drift Unexpected listening ports Alert (optional)

7.2 Drift Event

{
  "kind": "DRIFT",
  "delta": {
    "baselineImageDigest": "sha256:abc...",
    "changedFiles": ["/opt/app/server.py"],
    "newBinaries": [
      {"path": "/usr/local/bin/helper", "sha256": "..."}
    ]
  },
  "evidence": [
    {"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."},
    {"signal": "cri.task.inspect", "value": "pid=12345"}
  ]
}

8. Build-ID Workflow

8.1 Capture

  1. Observer extracts NT_GNU_BUILD_ID from /proc/<pid>/exe
  2. Normalize to lower-case hex
  3. Include in runtime event as process.buildId

8.2 Correlation

  1. Scanner.WebService persists observation
  2. Policy responses include buildIds list
  3. Debug files matched via .build-id/<aa>/<rest>.debug

8.3 Symbol Resolution

# Via CLI
stella runtime policy test --image sha256:abc123... | jq '.buildIds'

# Via debuginfod
debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1

9. Implementation Strategy

9.1 Phase 1: Observer Core (Complete)

  • CRI socket integration
  • Container lifecycle tracking
  • Entrypoint tracing
  • Loaded library sampling
  • Event batching and compression

9.2 Phase 2: Admission Webhook (Complete)

  • ValidatingAdmissionWebhook
  • Image digest resolution
  • Policy integration
  • Response caching
  • Fail-open/closed modes

9.3 Phase 3: Drift Detection (In Progress)

  • Process drift detection
  • Library drift detection
  • Filesystem drift monitoring (ZASTAVA-DRIFT-50-001)
  • Network posture checks (ZASTAVA-NET-51-001)

9.4 Phase 4: Advanced Features (Planned)

  • eBPF syscall tracing (optional)
  • Windows container support
  • Live used-by-entrypoint synthesis
  • Admission dry-run dashboards

10. Configuration

zastava:
  mode:
    observer: true
    webhook: true

  backend:
    baseAddress: "https://scanner-web.internal"
    policyPath: "/api/v1/scanner/policy/runtime"
    requestTimeoutSeconds: 5

  runtime:
    authority:
      issuer: "https://authority.internal"
      clientId: "zastava-observer"
      audience: ["scanner", "zastava"]
      scopes: ["api:scanner.runtime.write"]
      requireDpop: true
      requireMutualTls: true

    tenant: "acme-corp"
    engine: "auto"  # containerd|cri-o|docker|auto
    procfs: "/host/proc"

    collect:
      entryTrace: true
      loadedLibs: true
      maxLibs: 256
      maxHashBytesPerContainer: 64000000

  admission:
    enforce: true
    failOpenNamespaces: ["dev", "test"]
    verify:
      imageSignature: true
      sbomReferrer: true
      scannerPolicyPass: true
    cacheTtlSeconds: 300

  limits:
    eventsPerSecond: 50
    burst: 200
    perNodeQueue: 10000

11. Security Posture

11.1 Privileges

Capability Purpose Mode
CAP_SYS_PTRACE nsenter trace Optional
CAP_DAC_READ_SEARCH Read /proc Required
Host PID namespace Container PIDs Required
Read-only mounts /proc, sockets Required

11.2 Least Privilege

  • No write mounts
  • No host networking
  • No privilege escalation
  • Read-only rootfs

11.3 Data Minimization

  • No env var exfiltration
  • No command argument logging (unless diagnostic mode)
  • Rate limits prevent abuse

12. Observability

12.1 Observer Metrics

  • zastava.runtime.events.total{kind}
  • zastava.runtime.backend.latency.ms{endpoint}
  • zastava.proc_maps.samples.total{result}
  • zastava.entrytrace.depth{p99}
  • zastava.hash.bytes.total
  • zastava.buffer.drops.total

12.2 Webhook Metrics

  • zastava.admission.decisions.total{decision}
  • zastava.admission.cache.hits.total
  • zastava.backend.failures.total

13. Performance Targets

Operation Target
/proc/<pid>/maps sampling < 30ms (64 files)
Full library hash set < 200ms (256 libs)
Admission with warm cache < 8ms p95
Admission with backend call < 50ms p95
Event throughput 5k events/min/node

Resource Location
Zastava architecture docs/modules/zastava/architecture.md
Runtime event schema docs/modules/zastava/event-schema.md
Admission configuration docs/modules/zastava/admission-config.md
Deployment guide docs/modules/zastava/deployment.md

15. Sprint Mapping

  • Primary Sprint: SPRINT_0144_0001_0001_zastava_runtime_signals.md
  • Related Sprints:
    • SPRINT_0140_0001_0001_runtime_signals.md
    • SPRINT_0143_0000_0001_signals.md

Key Task IDs:

  • ZASTAVA-OBS-40-001 - Observer core (DONE)
  • ZASTAVA-ADM-41-001 - Admission webhook (DONE)
  • ZASTAVA-DRIFT-50-001 - Filesystem drift (IN PROGRESS)
  • ZASTAVA-NET-51-001 - Network posture (TODO)
  • ZASTAVA-EBPF-60-001 - eBPF integration (FUTURE)

16. Success Metrics

Metric Target
Event capture rate 99.9% of container starts
Admission latency < 50ms p95
Drift detection rate 100% of runtime changes
False positive rate < 1% of drift alerts
Node resource usage < 2% CPU, < 100MB RAM

Last updated: 2025-11-29