Files

master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.

2025-10-30 00:09:39 +02:00

4.7 KiB

Raw Blame History

Implementation plan — Telemetry

Delivery phases

Phase 1 – Collector & pipeline profiles
Publish OpenTelemetry collector configs (default, forensic, airgap), establish ingest gateways, TLS/mTLS, and attribute redaction policies.
Phase 2 – Storage backends & retention
Deploy Prometheus/Tempo/Loki (or equivalents) with retention tiers, bucket/object storage, deterministic manifest generation, and sealed-mode allowlists.
Phase 3 – Incident mode & forensic capture
Implement incident toggles (CLI/API), tail sampling adjustments, forensic bundle generation (OTLP archives, manifest/signature), and Notify hooks.
Phase 4 – Observability dashboards & automation
Deliver dashboards (service SLOs, queue depth, policy latency), alert rules, Grafana packages, and CLI automation for deployment and capture.
Phase 5 – Offline & compliance
Ship Offline Kit artefacts (collectors, configs, dashboards, replay tooling), signed bundles, and documentation for air-gapped review workflows.
Phase 6 – Hardening & SOC handoff
Complete RBAC integration, audit logging, incident response runbooks, performance tuning, and integration tests across services.

Work breakdown

Collector configs
- Maintain config templates per profile with processors (redaction, batching, resource detection) and exporters.
- CLI automation (stella telemetry deploy, stella telemetry profile diff), validation tests, and config signing.
Storage & retention
- Provision Prometheus/Tempo/Loki (or vendor equivalents) with retention tiers (default, forensic, airgap).
- Ensure determinism (chunk manifests, content hashing), remote-write allowlists, sealed/offline modes.
- Implement archivers for forensic bundles (metrics/traces/logs) with cosign signatures.
Incident mode
- API/CLI to toggle incident sampling, retention escalation, Notify signals, and auto bundle capture.
- Hook into Orchestrator to respond to incidents and revert after cooldown.
Dashboards & alerts
- Dashboard packages for core services (ingestion, policy, export, attestation).
- Alert rules for SLO burn, collector failure, exporter backlog, bundle generation errors.
- Self-observability metrics (collector_export_failures_total, telemetry_incident_mode{}).
Offline support
- Offline Kit assets: collector binaries/configs, import scripts, dashboards, replay instructions, compliance checklists.
- File-based exporters and manual transfer workflows with signed manifests.
Docs & runbooks
- Update observability overview, forensic capture guide, incident response checklist, sealed-mode instructions, RBAC matrix.
- SOC handoff package with control objectives and audit evidence.

Acceptance criteria

Collectors ingest metrics/logs/traces across deployments, applying redaction rules and tenant isolation; profiles validate via CI.
Storage backends retain data per default/forensic/airgap SLAs with deterministic chunk manifests and sealed-mode compliance.
Incident mode toggles sampling to 100 %, extends retention, triggers Notify, and captures forensic bundles signed with cosign.
Dashboards and alerts cover service SLOs, queue depth, policy latency, ingestion violations, and telemetry stack health.
CLI commands (stella telemetry deploy/capture/status) automate config rollout, forensic capture, and verification.
Offline bundles replay telemetry in sealed environments using provided scripts and manifests.

Risks & mitigations

PII leakage: strict redaction processors, policy-managed allowlists, audit tests.
Collector overload: horizontal scaling, batching, circuit breakers, incident mode throttling.
Storage cost: tiered retention, compression, pruning policies, offline archiving.
Air-gap drift: offline kit refresh schedule, deterministic manifest verification.
Alert fatigue: burn-rate alerts, deduping, SOC runbooks.

Test strategy

Config lint/tests: schema validation, unit tests for processors/exporters, golden configs.
Integration: simulate service traces/logs/metrics, verify pipelines, incident toggles, bundle generation.
Performance: load tests with peak ingestion, long retention windows, failover scenarios.
Security: redaction verification, RBAC/tenant scoping, sealed-mode tests, signed config verification.
Offline: capture bundles, transfer, replay, compliance attestation.

Definition of done

Collector profiles, storage backends, incident mode, dashboards, CLI, and offline kit delivered with telemetry and documentation.
Runbooks and SOC handoff packages published; compliance checklists appended.
./TASKS.md and ../../TASKS.md updated; imposed rule statements confirmed in documentation.

4.7 KiB Raw Blame History Unescape Escape

Implementation plan — Telemetry

Delivery phases

Work breakdown

Acceptance criteria

Risks & mitigations

Test strategy

Definition of done

4.7 KiB

Raw Blame History