feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										41
									
								
								docs/modules/telemetry/architecture.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										41
									
								
								docs/modules/telemetry/architecture.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,41 @@ | ||||
| # Telemetry architecture | ||||
|  | ||||
| > Derived from Epic 15 – Observability & Forensics; details collector topology, storage profiles, forensic pipelines, and offline packaging. | ||||
|  | ||||
| ## 1) Topology | ||||
|  | ||||
| - **Collector tier.** OpenTelemetry Collector instances deployed per environment (ingest TLS, GRPC/OTLP receivers, tail-based sampling). Config packages delivered via Offline Kit. | ||||
| - **Processing pipelines.** Pipelines for traces, metrics, logs with processors (batch, tail sampling, attributes redaction, resource detection). Profiles: `default`, `forensic` (high-retention), `airgap` (file-based exporters). | ||||
| - **Exporters.** OTLP to Prometheus/Tempo/Loki (online) or file/OTLP-HTTP to Offline Kit staging (air-gapped). Exporters are allow-listed to satisfy Sovereign readiness. | ||||
|  | ||||
| ## 2) Storage | ||||
|  | ||||
| - **Prometheus** for metrics with remote-write support and retention windows (default 30 days, forensic 180 days). | ||||
| - **Tempo** (or Jaeger all-in-one) for traces with block storage backend (S3-compatible or filesystem) and deterministic chunk manifests. | ||||
| - **Loki** for logs stored in immutable chunks; index shards hashed for reproducibility. | ||||
| - **Forensic archive** — periodic export of raw OTLP records into signed bundles (`otlp/metrics.pb`, `otlp/traces.pb`, `otlp/logs.pb`, `manifest.json`). | ||||
|  | ||||
| ## 3) Pipelines & Guardrails | ||||
|  | ||||
| - **Redaction.** Attribute processors strip PII/secrets based on policy-managed allowed keys. Redaction profiles mirrored in Offline Kit. | ||||
| - **Sampling.** Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100 % sampling, extends retention, and toggles Notify alerts. | ||||
| - **Alerting.** Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations. | ||||
|  | ||||
| ## 4) APIs & integration | ||||
|  | ||||
| - `GET /telemetry/config/profile/{name}` — download collector config bundle (YAML + signature). | ||||
| - `POST /telemetry/incidents/mode` — toggle incident sampling + forensic bundle generation. | ||||
| - `GET /telemetry/exports/forensic/{window}` — stream signed OTLP bundles for compliance. | ||||
| - CLI commands: `stella telemetry deploy --profile default`, `stella telemetry capture --window 24h --out bundle.tar.gz`. | ||||
|  | ||||
| ## 5) Offline support | ||||
|  | ||||
| - Offline Kit ships collector binaries/config, bootstrap scripts, dashboards, alert rules, and OTLP replay tooling. Bundles include `manifest.json` with digests, DSSE signatures, and instructions. | ||||
| - For offline environments, exporters write to local filesystem; operators transfer bundles to analysis workstation using signed manifests. | ||||
|  | ||||
| ## 6) Observability of telemetry stack | ||||
|  | ||||
| - Meta-metrics: `collector_export_failures_total`, `telemetry_bundle_generation_seconds`, `telemetry_incident_mode{state}`. | ||||
| - Health endpoints for collectors and storage clusters, plus dashboards for ingestion rate, retention, rule evaluations. | ||||
|  | ||||
| Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised. | ||||
		Reference in New Issue
	
	Block a user