true the date
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-30 19:23:21 +02:00
parent 71e9a56cfd
commit 0bef705bcc
14 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,444 @@
# Runtime Posture and Observation with Zastava
**Version:** 1.0
**Date:** 2025-11-29
**Status:** Canonical
This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification.
---
## 1. Executive Summary
Zastava is the **runtime inspector and enforcer** that provides ground-truth from running environments. Key capabilities:
- **Runtime Observation** - Inventory containers, track entrypoints, monitor loaded DSOs
- **Admission Control** - Kubernetes ValidatingAdmissionWebhook for pre-flight gates
- **Drift Detection** - Identify unexpected processes, libraries, and file changes
- **Posture Verification** - Validate signatures, SBOM referrers, attestations
- **Build-ID Tracking** - Correlate binaries to debug symbols and source
---
## 2. Market Drivers
### 2.1 Target Segments
| Segment | Runtime Requirements | Use Case |
|---------|---------------------|----------|
| **Enterprise Security** | Runtime visibility | Post-deploy monitoring |
| **Platform Engineering** | Admission gates | Policy enforcement |
| **Compliance Teams** | Continuous verification | Runtime attestation |
| **DevSecOps** | Drift detection | Configuration management |
### 2.2 Competitive Positioning
Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with:
- **Runtime ground-truth** from actual container execution
- **DSO tracking** - which libraries are actually loaded
- **Entrypoint tracing** - what programs actually run
- **Native Kubernetes admission** with policy integration
- **Build-ID correlation** for symbol resolution
---
## 3. Architecture Overview
### 3.1 Component Topology
**Kubernetes Deployment:**
```
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
```
**Docker/VM Deployment:**
```
stellaops/zastava-agent # System service; watch Docker events; observer only
```
### 3.2 Dependencies
| Dependency | Purpose |
|------------|---------|
| Authority | OpToks (DPoP/mTLS) for API calls |
| Scanner.WebService | Event ingestion, policy decisions |
| OCI Registry | Referrer/signature checks |
| Container Runtime | containerd/CRI-O/Docker interfaces |
| Kubernetes API | Pod watching, admission webhook |
---
## 4. Runtime Event Model
### 4.1 Event Types
| Kind | Trigger | Payload |
|------|---------|---------|
| `CONTAINER_START` | Container lifecycle | Image, entrypoint, namespace |
| `CONTAINER_STOP` | Container termination | Exit code, duration |
| `DRIFT` | Unexpected change | Changed files, new binaries |
| `POLICY_VIOLATION` | Rule breach | Reason, severity |
| `ATTESTATION_STATUS` | Verification result | Signed, SBOM present |
### 4.2 Event Envelope
```json
{
"eventId": "uuid",
"when": "2025-11-29T12:00:00Z",
"kind": "CONTAINER_START",
"tenant": "acme-corp",
"node": "worker-node-01",
"runtime": {
"engine": "containerd",
"version": "1.7.19"
},
"workload": {
"platform": "kubernetes",
"namespace": "production",
"pod": "api-7c9fbbd8b7-ktd84",
"container": "api",
"containerId": "containerd://abc123...",
"imageRef": "ghcr.io/acme/api@sha256:def456...",
"owner": {
"kind": "Deployment",
"name": "api"
}
},
"process": {
"pid": 12345,
"entrypoint": ["/entrypoint.sh", "--serve"],
"entryTrace": [
{"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"},
{"file": "<argv>", "op": "python", "target": "/opt/app/server.py"}
],
"buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1"
},
"loadedLibs": [
{"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."},
{"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."}
],
"posture": {
"imageSigned": true,
"sbomReferrer": "present",
"attestation": {
"uuid": "rekor-uuid",
"verified": true
}
}
}
```
---
## 5. Observer Capabilities
### 5.1 Container Lifecycle Tracking
- Watch container start/stop via CRI socket
- Resolve container to image digest
- Map mount points and rootfs paths
- Track container metadata (labels, annotations)
### 5.2 Entrypoint Tracing
- Attach short-lived nsenter to container PID 1
- Parse shell scripts for exec chain
- Record terminal program (actual binary)
- Bounded depth to prevent infinite loops
### 5.3 Loaded Library Sampling
- Read `/proc/<pid>/maps` for loaded DSOs
- Compute SHA-256 for each mapped file
- Track GNU build-IDs for symbol correlation
- Rate limits prevent resource exhaustion
### 5.4 Posture Verification
- Image signature presence (cosign policies)
- SBOM referrers check (registry HEAD)
- Rekor attestation lookup via Scanner.WebService
- Policy verdict from backend
---
## 6. Admission Control
### 6.1 Gate Criteria
| Criterion | Description | Configurable |
|-----------|-------------|--------------|
| Image Signature | Cosign-verifiable to configured keys | Yes |
| SBOM Availability | CycloneDX referrer or catalog entry | Yes |
| Policy Verdict | Backend PASS required | Yes |
| Registry Allowlist | Permitted registries | Yes |
| Tag Bans | Reject `:latest`, etc. | Yes |
| Base Image Allowlist | Permitted base digests | Yes |
### 6.2 Decision Flow
```mermaid
sequenceDiagram
participant K8s as API Server
participant WH as Zastava Webhook
participant SW as Scanner.WebService
K8s->>WH: AdmissionReview(Pod)
WH->>WH: Resolve images to digests
WH->>SW: POST /policy/runtime
SW-->>WH: {signed, hasSbom, verdict, reasons}
alt All pass
WH-->>K8s: Allow
else Any fail
WH-->>K8s: Deny (with reasons)
end
```
### 6.3 Response Caching
- Per-digest results cached for TTL (default 300s)
- Fail-open or fail-closed per namespace
- Cache invalidation on policy updates
---
## 7. Drift Detection
### 7.1 Signal Types
| Signal | Detection Method | Action |
|--------|-----------------|--------|
| Process Drift | Terminal program differs from EntryTrace baseline | Alert |
| Library Drift | Loaded DSOs not in Usage SBOM | Alert, delta scan |
| Filesystem Drift | New executables with mtime after image creation | Alert |
| Network Drift | Unexpected listening ports | Alert (optional) |
### 7.2 Drift Event
```json
{
"kind": "DRIFT",
"delta": {
"baselineImageDigest": "sha256:abc...",
"changedFiles": ["/opt/app/server.py"],
"newBinaries": [
{"path": "/usr/local/bin/helper", "sha256": "..."}
]
},
"evidence": [
{"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."},
{"signal": "cri.task.inspect", "value": "pid=12345"}
]
}
```
---
## 8. Build-ID Workflow
### 8.1 Capture
1. Observer extracts `NT_GNU_BUILD_ID` from `/proc/<pid>/exe`
2. Normalize to lower-case hex
3. Include in runtime event as `process.buildId`
### 8.2 Correlation
1. Scanner.WebService persists observation
2. Policy responses include `buildIds` list
3. Debug files matched via `.build-id/<aa>/<rest>.debug`
### 8.3 Symbol Resolution
```bash
# Via CLI
stella runtime policy test --image sha256:abc123... | jq '.buildIds'
# Via debuginfod
debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1
```
---
## 9. Implementation Strategy
### 9.1 Phase 1: Observer Core (Complete)
- [x] CRI socket integration
- [x] Container lifecycle tracking
- [x] Entrypoint tracing
- [x] Loaded library sampling
- [x] Event batching and compression
### 9.2 Phase 2: Admission Webhook (Complete)
- [x] ValidatingAdmissionWebhook
- [x] Image digest resolution
- [x] Policy integration
- [x] Response caching
- [x] Fail-open/closed modes
### 9.3 Phase 3: Drift Detection (In Progress)
- [x] Process drift detection
- [x] Library drift detection
- [ ] Filesystem drift monitoring (ZASTAVA-DRIFT-50-001)
- [ ] Network posture checks (ZASTAVA-NET-51-001)
### 9.4 Phase 4: Advanced Features (Planned)
- [ ] eBPF syscall tracing (optional)
- [ ] Windows container support
- [ ] Live used-by-entrypoint synthesis
- [ ] Admission dry-run dashboards
---
## 10. Configuration
```yaml
zastava:
mode:
observer: true
webhook: true
backend:
baseAddress: "https://scanner-web.internal"
policyPath: "/api/v1/scanner/policy/runtime"
requestTimeoutSeconds: 5
runtime:
authority:
issuer: "https://authority.internal"
clientId: "zastava-observer"
audience: ["scanner", "zastava"]
scopes: ["api:scanner.runtime.write"]
requireDpop: true
requireMutualTls: true
tenant: "acme-corp"
engine: "auto" # containerd|cri-o|docker|auto
procfs: "/host/proc"
collect:
entryTrace: true
loadedLibs: true
maxLibs: 256
maxHashBytesPerContainer: 64000000
admission:
enforce: true
failOpenNamespaces: ["dev", "test"]
verify:
imageSignature: true
sbomReferrer: true
scannerPolicyPass: true
cacheTtlSeconds: 300
limits:
eventsPerSecond: 50
burst: 200
perNodeQueue: 10000
```
---
## 11. Security Posture
### 11.1 Privileges
| Capability | Purpose | Mode |
|------------|---------|------|
| `CAP_SYS_PTRACE` | nsenter trace | Optional |
| `CAP_DAC_READ_SEARCH` | Read /proc | Required |
| Host PID namespace | Container PIDs | Required |
| Read-only mounts | /proc, sockets | Required |
### 11.2 Least Privilege
- No write mounts
- No host networking
- No privilege escalation
- Read-only rootfs
### 11.3 Data Minimization
- No env var exfiltration
- No command argument logging (unless diagnostic mode)
- Rate limits prevent abuse
---
## 12. Observability
### 12.1 Observer Metrics
- `zastava.runtime.events.total{kind}`
- `zastava.runtime.backend.latency.ms{endpoint}`
- `zastava.proc_maps.samples.total{result}`
- `zastava.entrytrace.depth{p99}`
- `zastava.hash.bytes.total`
- `zastava.buffer.drops.total`
### 12.2 Webhook Metrics
- `zastava.admission.decisions.total{decision}`
- `zastava.admission.cache.hits.total`
- `zastava.backend.failures.total`
---
## 13. Performance Targets
| Operation | Target |
|-----------|--------|
| `/proc/<pid>/maps` sampling | < 30ms (64 files) |
| Full library hash set | < 200ms (256 libs) |
| Admission with warm cache | < 8ms p95 |
| Admission with backend call | < 50ms p95 |
| Event throughput | 5k events/min/node |
---
## 14. Related Documentation
| Resource | Location |
|----------|----------|
| Zastava architecture | `docs/modules/zastava/architecture.md` |
| Runtime event schema | `docs/modules/zastava/event-schema.md` |
| Admission configuration | `docs/modules/zastava/admission-config.md` |
| Deployment guide | `docs/modules/zastava/deployment.md` |
---
## 15. Sprint Mapping
- **Primary Sprint:** SPRINT_0144_0001_0001_zastava_runtime_signals.md
- **Related Sprints:**
- SPRINT_0140_0001_0001_runtime_signals.md
- SPRINT_0143_0000_0001_signals.md
**Key Task IDs:**
- `ZASTAVA-OBS-40-001` - Observer core (DONE)
- `ZASTAVA-ADM-41-001` - Admission webhook (DONE)
- `ZASTAVA-DRIFT-50-001` - Filesystem drift (IN PROGRESS)
- `ZASTAVA-NET-51-001` - Network posture (TODO)
- `ZASTAVA-EBPF-60-001` - eBPF integration (FUTURE)
---
## 16. Success Metrics
| Metric | Target |
|--------|--------|
| Event capture rate | 99.9% of container starts |
| Admission latency | < 50ms p95 |
| Drift detection rate | 100% of runtime changes |
| False positive rate | < 1% of drift alerts |
| Node resource usage | < 2% CPU, < 100MB RAM |
---
*Last updated: 2025-11-29*