feat: Implement Filesystem and MongoDB provenance writers for PackRun execution context
- Added `FilesystemPackRunProvenanceWriter` to write provenance manifests to the filesystem. - Introduced `MongoPackRunArtifactReader` to read artifacts from MongoDB. - Created `MongoPackRunProvenanceWriter` to store provenance manifests in MongoDB. - Developed unit tests for filesystem and MongoDB provenance writers. - Established `ITimelineEventStore` and `ITimelineIngestionService` interfaces for timeline event handling. - Implemented `TimelineIngestionService` to validate and persist timeline events with hashing. - Created PostgreSQL schema and migration scripts for timeline indexing. - Added dependency injection support for timeline indexer services. - Developed tests for timeline ingestion and schema validation.
This commit is contained in:
@@ -15,6 +15,7 @@ Telemetry module captures deployment and operations guidance for the shared obse
|
||||
- [Architecture](./architecture.md)
|
||||
- [Implementation plan](./implementation_plan.md)
|
||||
- [Task board](./TASKS.md)
|
||||
- [Observability runbook](./operations/observability.md) (offline import friendly)
|
||||
|
||||
## How to get started
|
||||
1. Open sprint file `/docs/implplan/SPRINT_*.md` and locate the stories referencing this module.
|
||||
|
||||
@@ -2,7 +2,12 @@
|
||||
|
||||
Telemetry module captures deployment and operations guidance for the shared observability stack (collectors, storage, dashboards).
|
||||
|
||||
## Responsibilities
|
||||
## Latest updates (2025-11-30)
|
||||
- Sprint tracker `docs/implplan/SPRINT_0330_0001_0001_docs_modules_telemetry.md` and module `TASKS.md` added to mirror status.
|
||||
- Observability runbook stub + dashboard placeholder added under `operations/` (offline import).
|
||||
- Storage/isolation posture references updated; align with platform docs.
|
||||
|
||||
## Responsibilities
|
||||
- Deploy and operate OpenTelemetry collectors for StellaOps services.
|
||||
- Provide storage configuration for Prometheus/Tempo/Loki stacks.
|
||||
- Document smoke tests and offline bootstrapping steps.
|
||||
@@ -22,6 +27,7 @@ Telemetry module captures deployment and operations guidance for the shared obse
|
||||
- Smoke script references (../../ops/devops/telemetry).
|
||||
- Bundle packaging instructions in ops/devops/telemetry.
|
||||
- Sprint 23 console security sign-off (2025-10-27) added the `console-security.json` Grafana board and burn-rate alert pack—ensure environments import the updated dashboards/alerts referenced in `docs/updates/2025-10-27-console-security-signoff.md`.
|
||||
- Observability assets for this sprint: `operations/observability.md` and `operations/dashboards/telemetry-observability.json` (offline import).
|
||||
|
||||
## Related resources
|
||||
- ./operations/collector.md
|
||||
|
||||
9
docs/modules/telemetry/TASKS.md
Normal file
9
docs/modules/telemetry/TASKS.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Telemetry · TASKS (status mirror)
|
||||
|
||||
| Task ID | Status | Owner(s) | Notes / Evidence |
|
||||
| --- | --- | --- | --- |
|
||||
| TELEMETRY-DOCS-0001 | DONE (2025-11-30) | Docs Guild | README/architecture refreshed for storage/isolation posture; sprint links added. |
|
||||
| TELEMETRY-OPS-0001 | DONE (2025-11-30) | Ops Guild | Observability runbook stub + Grafana placeholder added under `operations/`. |
|
||||
| TELEMETRY-ENG-0001 | DONE (2025-11-30) | Module Team | TASKS board created; statuses mirrored with `docs/implplan/SPRINT_0330_0001_0001_docs_modules_telemetry.md`. |
|
||||
|
||||
> Keep this table in lockstep with the sprint Delivery Tracker (TODO/DOING/DONE/BLOCKED updates go to both files).
|
||||
@@ -58,7 +58,12 @@
|
||||
- **Security:** redaction verification, RBAC/tenant scoping, sealed-mode tests, signed config verification.
|
||||
- **Offline:** capture bundles, transfer, replay, compliance attestation.
|
||||
|
||||
## Definition of done
|
||||
- Collector profiles, storage backends, incident mode, dashboards, CLI, and offline kit delivered with telemetry and documentation.
|
||||
- Runbooks and SOC handoff packages published; compliance checklists appended.
|
||||
- ./TASKS.md and ../../TASKS.md updated; imposed rule statements confirmed in documentation.
|
||||
## Definition of done
|
||||
- Collector profiles, storage backends, incident mode, dashboards, CLI, and offline kit delivered with telemetry and documentation.
|
||||
- Runbooks and SOC handoff packages published; compliance checklists appended.
|
||||
- ./TASKS.md and ../../TASKS.md updated; imposed rule statements confirmed in documentation.
|
||||
|
||||
## Sprint alignment (2025-11-30)
|
||||
- Docs refresh tracked in `docs/implplan/SPRINT_0330_0001_0001_docs_modules_telemetry.md`; statuses mirrored in `docs/modules/telemetry/TASKS.md`.
|
||||
- Observability evidence lives in `operations/observability.md` with Grafana JSON stub under `operations/dashboards/`.
|
||||
- Keep future doc/ops updates mirrored across sprint, TASKS, and module front doors to avoid drift.
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_note": "Placeholder Grafana dashboard stub for Telemetry. Replace panels when metrics endpoints are available; keep offline-import friendly.",
|
||||
"schemaVersion": 39,
|
||||
"title": "Telemetry Observability (stub)",
|
||||
"panels": []
|
||||
}
|
||||
38
docs/modules/telemetry/operations/observability.md
Normal file
38
docs/modules/telemetry/operations/observability.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Telemetry observability runbook (stub · 2025-11-29 demo)
|
||||
|
||||
## Dashboards (offline import)
|
||||
- Grafana JSON: `docs/modules/telemetry/operations/dashboards/telemetry-observability.json` (import locally; no external data sources assumed).
|
||||
- Planned panels: collector uptime, scrape errors, ingestion/backlog per tenant, storage retention headroom, query latency p95/p99, and OTLP export errors.
|
||||
|
||||
## Key metrics
|
||||
- `telemetry_collector_uptime_seconds` — per-collector uptime.
|
||||
- `telemetry_scrape_failures_total{job}` — scrape failures per job.
|
||||
- `telemetry_ingest_backlog` — queued spans/logs/metrics awaiting storage.
|
||||
- `telemetry_storage_retention_percent_used` — storage utilization against retention budget.
|
||||
- `telemetry_query_latency_seconds_bucket{route}` — API/query latency.
|
||||
- `telemetry_otlp_export_failures_total{signal}` — OTLP export failures by signal.
|
||||
|
||||
## Logs & traces
|
||||
- Correlate by `trace_id` and `tenant`; include `collector_id`, `pipeline`, `exporter` fields.
|
||||
- Traces disabled by default for air-gap; enable by setting OTLP endpoints to on-prem collectors.
|
||||
|
||||
## Health/diagnostics
|
||||
- `/health/liveness` and `/health/readiness` (collector + storage gateway) check exporter reachability and disk headroom.
|
||||
- `/status` exposes build version, commit, feature flags; verify against offline bundle manifest.
|
||||
- Storage probe: `GET /api/storage/usage` (if available) to confirm retention headroom; otherwise rely on Prometheus metrics.
|
||||
|
||||
## Alert hints
|
||||
- OTLP export failures > 0 over 5m.
|
||||
- Ingest backlog above threshold (configurable per tenant/workload).
|
||||
- Query latency p99 > 1s for `/api/query` routes.
|
||||
- Storage utilization > 85% of retention budget.
|
||||
|
||||
## Offline verification steps
|
||||
1) Import Grafana JSON locally; point to Prometheus scrape labeled `telemetry`.
|
||||
2) Run collector smoke: push sample OTLP spans/logs/metrics to local collector and confirm metrics emit in Prometheus.
|
||||
3) Fetch `/status` and compare commit/version to offline bundle manifest.
|
||||
|
||||
## Evidence locations
|
||||
- Sprint tracker: `docs/implplan/SPRINT_0330_0001_0001_docs_modules_telemetry.md`.
|
||||
- Module docs: `README.md`, `architecture.md`, `implementation_plan.md`.
|
||||
- Dashboard stub: `operations/dashboards/telemetry-observability.json`.
|
||||
Reference in New Issue
Block a user