up
This commit is contained in:
@@ -0,0 +1,444 @@
|
||||
# Runtime Posture and Observation with Zastava
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Zastava is the **runtime inspector and enforcer** that provides ground-truth from running environments. Key capabilities:
|
||||
|
||||
- **Runtime Observation** - Inventory containers, track entrypoints, monitor loaded DSOs
|
||||
- **Admission Control** - Kubernetes ValidatingAdmissionWebhook for pre-flight gates
|
||||
- **Drift Detection** - Identify unexpected processes, libraries, and file changes
|
||||
- **Posture Verification** - Validate signatures, SBOM referrers, attestations
|
||||
- **Build-ID Tracking** - Correlate binaries to debug symbols and source
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Runtime Requirements | Use Case |
|
||||
|---------|---------------------|----------|
|
||||
| **Enterprise Security** | Runtime visibility | Post-deploy monitoring |
|
||||
| **Platform Engineering** | Admission gates | Policy enforcement |
|
||||
| **Compliance Teams** | Continuous verification | Runtime attestation |
|
||||
| **DevSecOps** | Drift detection | Configuration management |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with:
|
||||
- **Runtime ground-truth** from actual container execution
|
||||
- **DSO tracking** - which libraries are actually loaded
|
||||
- **Entrypoint tracing** - what programs actually run
|
||||
- **Native Kubernetes admission** with policy integration
|
||||
- **Build-ID correlation** for symbol resolution
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture Overview
|
||||
|
||||
### 3.1 Component Topology
|
||||
|
||||
**Kubernetes Deployment:**
|
||||
```
|
||||
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
|
||||
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
|
||||
```
|
||||
|
||||
**Docker/VM Deployment:**
|
||||
```
|
||||
stellaops/zastava-agent # System service; watch Docker events; observer only
|
||||
```
|
||||
|
||||
### 3.2 Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
|------------|---------|
|
||||
| Authority | OpToks (DPoP/mTLS) for API calls |
|
||||
| Scanner.WebService | Event ingestion, policy decisions |
|
||||
| OCI Registry | Referrer/signature checks |
|
||||
| Container Runtime | containerd/CRI-O/Docker interfaces |
|
||||
| Kubernetes API | Pod watching, admission webhook |
|
||||
|
||||
---
|
||||
|
||||
## 4. Runtime Event Model
|
||||
|
||||
### 4.1 Event Types
|
||||
|
||||
| Kind | Trigger | Payload |
|
||||
|------|---------|---------|
|
||||
| `CONTAINER_START` | Container lifecycle | Image, entrypoint, namespace |
|
||||
| `CONTAINER_STOP` | Container termination | Exit code, duration |
|
||||
| `DRIFT` | Unexpected change | Changed files, new binaries |
|
||||
| `POLICY_VIOLATION` | Rule breach | Reason, severity |
|
||||
| `ATTESTATION_STATUS` | Verification result | Signed, SBOM present |
|
||||
|
||||
### 4.2 Event Envelope
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"when": "2025-11-29T12:00:00Z",
|
||||
"kind": "CONTAINER_START",
|
||||
"tenant": "acme-corp",
|
||||
"node": "worker-node-01",
|
||||
"runtime": {
|
||||
"engine": "containerd",
|
||||
"version": "1.7.19"
|
||||
},
|
||||
"workload": {
|
||||
"platform": "kubernetes",
|
||||
"namespace": "production",
|
||||
"pod": "api-7c9fbbd8b7-ktd84",
|
||||
"container": "api",
|
||||
"containerId": "containerd://abc123...",
|
||||
"imageRef": "ghcr.io/acme/api@sha256:def456...",
|
||||
"owner": {
|
||||
"kind": "Deployment",
|
||||
"name": "api"
|
||||
}
|
||||
},
|
||||
"process": {
|
||||
"pid": 12345,
|
||||
"entrypoint": ["/entrypoint.sh", "--serve"],
|
||||
"entryTrace": [
|
||||
{"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"},
|
||||
{"file": "<argv>", "op": "python", "target": "/opt/app/server.py"}
|
||||
],
|
||||
"buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1"
|
||||
},
|
||||
"loadedLibs": [
|
||||
{"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."},
|
||||
{"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."}
|
||||
],
|
||||
"posture": {
|
||||
"imageSigned": true,
|
||||
"sbomReferrer": "present",
|
||||
"attestation": {
|
||||
"uuid": "rekor-uuid",
|
||||
"verified": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Observer Capabilities
|
||||
|
||||
### 5.1 Container Lifecycle Tracking
|
||||
|
||||
- Watch container start/stop via CRI socket
|
||||
- Resolve container to image digest
|
||||
- Map mount points and rootfs paths
|
||||
- Track container metadata (labels, annotations)
|
||||
|
||||
### 5.2 Entrypoint Tracing
|
||||
|
||||
- Attach short-lived nsenter to container PID 1
|
||||
- Parse shell scripts for exec chain
|
||||
- Record terminal program (actual binary)
|
||||
- Bounded depth to prevent infinite loops
|
||||
|
||||
### 5.3 Loaded Library Sampling
|
||||
|
||||
- Read `/proc/<pid>/maps` for loaded DSOs
|
||||
- Compute SHA-256 for each mapped file
|
||||
- Track GNU build-IDs for symbol correlation
|
||||
- Rate limits prevent resource exhaustion
|
||||
|
||||
### 5.4 Posture Verification
|
||||
|
||||
- Image signature presence (cosign policies)
|
||||
- SBOM referrers check (registry HEAD)
|
||||
- Rekor attestation lookup via Scanner.WebService
|
||||
- Policy verdict from backend
|
||||
|
||||
---
|
||||
|
||||
## 6. Admission Control
|
||||
|
||||
### 6.1 Gate Criteria
|
||||
|
||||
| Criterion | Description | Configurable |
|
||||
|-----------|-------------|--------------|
|
||||
| Image Signature | Cosign-verifiable to configured keys | Yes |
|
||||
| SBOM Availability | CycloneDX referrer or catalog entry | Yes |
|
||||
| Policy Verdict | Backend PASS required | Yes |
|
||||
| Registry Allowlist | Permitted registries | Yes |
|
||||
| Tag Bans | Reject `:latest`, etc. | Yes |
|
||||
| Base Image Allowlist | Permitted base digests | Yes |
|
||||
|
||||
### 6.2 Decision Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant K8s as API Server
|
||||
participant WH as Zastava Webhook
|
||||
participant SW as Scanner.WebService
|
||||
|
||||
K8s->>WH: AdmissionReview(Pod)
|
||||
WH->>WH: Resolve images to digests
|
||||
WH->>SW: POST /policy/runtime
|
||||
SW-->>WH: {signed, hasSbom, verdict, reasons}
|
||||
alt All pass
|
||||
WH-->>K8s: Allow
|
||||
else Any fail
|
||||
WH-->>K8s: Deny (with reasons)
|
||||
end
|
||||
```
|
||||
|
||||
### 6.3 Response Caching
|
||||
|
||||
- Per-digest results cached for TTL (default 300s)
|
||||
- Fail-open or fail-closed per namespace
|
||||
- Cache invalidation on policy updates
|
||||
|
||||
---
|
||||
|
||||
## 7. Drift Detection
|
||||
|
||||
### 7.1 Signal Types
|
||||
|
||||
| Signal | Detection Method | Action |
|
||||
|--------|-----------------|--------|
|
||||
| Process Drift | Terminal program differs from EntryTrace baseline | Alert |
|
||||
| Library Drift | Loaded DSOs not in Usage SBOM | Alert, delta scan |
|
||||
| Filesystem Drift | New executables with mtime after image creation | Alert |
|
||||
| Network Drift | Unexpected listening ports | Alert (optional) |
|
||||
|
||||
### 7.2 Drift Event
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "DRIFT",
|
||||
"delta": {
|
||||
"baselineImageDigest": "sha256:abc...",
|
||||
"changedFiles": ["/opt/app/server.py"],
|
||||
"newBinaries": [
|
||||
{"path": "/usr/local/bin/helper", "sha256": "..."}
|
||||
]
|
||||
},
|
||||
"evidence": [
|
||||
{"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."},
|
||||
{"signal": "cri.task.inspect", "value": "pid=12345"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Build-ID Workflow
|
||||
|
||||
### 8.1 Capture
|
||||
|
||||
1. Observer extracts `NT_GNU_BUILD_ID` from `/proc/<pid>/exe`
|
||||
2. Normalize to lower-case hex
|
||||
3. Include in runtime event as `process.buildId`
|
||||
|
||||
### 8.2 Correlation
|
||||
|
||||
1. Scanner.WebService persists observation
|
||||
2. Policy responses include `buildIds` list
|
||||
3. Debug files matched via `.build-id/<aa>/<rest>.debug`
|
||||
|
||||
### 8.3 Symbol Resolution
|
||||
|
||||
```bash
|
||||
# Via CLI
|
||||
stella runtime policy test --image sha256:abc123... | jq '.buildIds'
|
||||
|
||||
# Via debuginfod
|
||||
debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Strategy
|
||||
|
||||
### 9.1 Phase 1: Observer Core (Complete)
|
||||
|
||||
- [x] CRI socket integration
|
||||
- [x] Container lifecycle tracking
|
||||
- [x] Entrypoint tracing
|
||||
- [x] Loaded library sampling
|
||||
- [x] Event batching and compression
|
||||
|
||||
### 9.2 Phase 2: Admission Webhook (Complete)
|
||||
|
||||
- [x] ValidatingAdmissionWebhook
|
||||
- [x] Image digest resolution
|
||||
- [x] Policy integration
|
||||
- [x] Response caching
|
||||
- [x] Fail-open/closed modes
|
||||
|
||||
### 9.3 Phase 3: Drift Detection (In Progress)
|
||||
|
||||
- [x] Process drift detection
|
||||
- [x] Library drift detection
|
||||
- [ ] Filesystem drift monitoring (ZASTAVA-DRIFT-50-001)
|
||||
- [ ] Network posture checks (ZASTAVA-NET-51-001)
|
||||
|
||||
### 9.4 Phase 4: Advanced Features (Planned)
|
||||
|
||||
- [ ] eBPF syscall tracing (optional)
|
||||
- [ ] Windows container support
|
||||
- [ ] Live used-by-entrypoint synthesis
|
||||
- [ ] Admission dry-run dashboards
|
||||
|
||||
---
|
||||
|
||||
## 10. Configuration
|
||||
|
||||
```yaml
|
||||
zastava:
|
||||
mode:
|
||||
observer: true
|
||||
webhook: true
|
||||
|
||||
backend:
|
||||
baseAddress: "https://scanner-web.internal"
|
||||
policyPath: "/api/v1/scanner/policy/runtime"
|
||||
requestTimeoutSeconds: 5
|
||||
|
||||
runtime:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
clientId: "zastava-observer"
|
||||
audience: ["scanner", "zastava"]
|
||||
scopes: ["api:scanner.runtime.write"]
|
||||
requireDpop: true
|
||||
requireMutualTls: true
|
||||
|
||||
tenant: "acme-corp"
|
||||
engine: "auto" # containerd|cri-o|docker|auto
|
||||
procfs: "/host/proc"
|
||||
|
||||
collect:
|
||||
entryTrace: true
|
||||
loadedLibs: true
|
||||
maxLibs: 256
|
||||
maxHashBytesPerContainer: 64000000
|
||||
|
||||
admission:
|
||||
enforce: true
|
||||
failOpenNamespaces: ["dev", "test"]
|
||||
verify:
|
||||
imageSignature: true
|
||||
sbomReferrer: true
|
||||
scannerPolicyPass: true
|
||||
cacheTtlSeconds: 300
|
||||
|
||||
limits:
|
||||
eventsPerSecond: 50
|
||||
burst: 200
|
||||
perNodeQueue: 10000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Security Posture
|
||||
|
||||
### 11.1 Privileges
|
||||
|
||||
| Capability | Purpose | Mode |
|
||||
|------------|---------|------|
|
||||
| `CAP_SYS_PTRACE` | nsenter trace | Optional |
|
||||
| `CAP_DAC_READ_SEARCH` | Read /proc | Required |
|
||||
| Host PID namespace | Container PIDs | Required |
|
||||
| Read-only mounts | /proc, sockets | Required |
|
||||
|
||||
### 11.2 Least Privilege
|
||||
|
||||
- No write mounts
|
||||
- No host networking
|
||||
- No privilege escalation
|
||||
- Read-only rootfs
|
||||
|
||||
### 11.3 Data Minimization
|
||||
|
||||
- No env var exfiltration
|
||||
- No command argument logging (unless diagnostic mode)
|
||||
- Rate limits prevent abuse
|
||||
|
||||
---
|
||||
|
||||
## 12. Observability
|
||||
|
||||
### 12.1 Observer Metrics
|
||||
|
||||
- `zastava.runtime.events.total{kind}`
|
||||
- `zastava.runtime.backend.latency.ms{endpoint}`
|
||||
- `zastava.proc_maps.samples.total{result}`
|
||||
- `zastava.entrytrace.depth{p99}`
|
||||
- `zastava.hash.bytes.total`
|
||||
- `zastava.buffer.drops.total`
|
||||
|
||||
### 12.2 Webhook Metrics
|
||||
|
||||
- `zastava.admission.decisions.total{decision}`
|
||||
- `zastava.admission.cache.hits.total`
|
||||
- `zastava.backend.failures.total`
|
||||
|
||||
---
|
||||
|
||||
## 13. Performance Targets
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| `/proc/<pid>/maps` sampling | < 30ms (64 files) |
|
||||
| Full library hash set | < 200ms (256 libs) |
|
||||
| Admission with warm cache | < 8ms p95 |
|
||||
| Admission with backend call | < 50ms p95 |
|
||||
| Event throughput | 5k events/min/node |
|
||||
|
||||
---
|
||||
|
||||
## 14. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Zastava architecture | `docs/modules/zastava/architecture.md` |
|
||||
| Runtime event schema | `docs/modules/zastava/event-schema.md` |
|
||||
| Admission configuration | `docs/modules/zastava/admission-config.md` |
|
||||
| Deployment guide | `docs/modules/zastava/deployment.md` |
|
||||
|
||||
---
|
||||
|
||||
## 15. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0144_0001_0001_zastava_runtime_signals.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0140_0001_0001_runtime_signals.md
|
||||
- SPRINT_0143_0000_0001_signals.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `ZASTAVA-OBS-40-001` - Observer core (DONE)
|
||||
- `ZASTAVA-ADM-41-001` - Admission webhook (DONE)
|
||||
- `ZASTAVA-DRIFT-50-001` - Filesystem drift (IN PROGRESS)
|
||||
- `ZASTAVA-NET-51-001` - Network posture (TODO)
|
||||
- `ZASTAVA-EBPF-60-001` - eBPF integration (FUTURE)
|
||||
|
||||
---
|
||||
|
||||
## 16. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event capture rate | 99.9% of container starts |
|
||||
| Admission latency | < 50ms p95 |
|
||||
| Drift detection rate | 100% of runtime changes |
|
||||
| False positive rate | < 1% of drift alerts |
|
||||
| Node resource usage | < 2% CPU, < 100MB RAM |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Reference in New Issue
Block a user