Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled

This commit is contained in:
master
2025-11-20 07:50:52 +02:00
parent 616ec73133
commit 10212d67c0
473 changed files with 316758 additions and 388 deletions

View File

@@ -0,0 +1,29 @@
# CLI incident toggle contract (CLI-OBS-12-001)
**Goal**: define a deterministic CLI flag and config surface to enter/exit incident mode, required by TELEMETRY-OBS-55-001/56-001.
## Flags and config
- CLI flag: `--incident-mode` (bool). Defaults to false.
- Config key: `Telemetry:Incident:Enabled` (bool) and `Telemetry:Incident:TTL` (TimeSpan).
- When both flag and config specified, flag wins (opt-in only; cannot disable if config enables and flag present).
## Effects when enabled
- Increase sampling rate ceiling to 100% for telemetry within the process.
- Add tag `incident=true` to logs/metrics/traces.
- Shorten exporter/reporting flush interval to 5s; disable external exporters when `Sealed=true`.
- Emit activation audit event `telemetry.incident.activated` with fields `{tenant, actor, source, expires_at}`.
## Persistence
- Incident flag runtime value stored in local state file `~/.stellaops/incident-mode.json` with fields `{enabled, set_at, expires_at, actor}` for offline continuity.
- File is tenant-scoped; permissions 0600.
## Expiry / TTL
- Default TTL: 30 minutes unless `Telemetry:Incident:TTL` provided.
- On expiry, emit `telemetry.incident.expired` audit event.
## Validation expectations
- CLI should refuse `--incident-mode` if `--sealed` is set and external exporters are configured (must drop exporters first).
- Unit tests to cover precedence (flag over config), TTL expiry, state file perms, and audit emissions.
## Provenance
- Authored 2025-11-20 to unblock PREP-CLI-OBS-12-001 and TELEMETRY-OBS-55-001.

View File

@@ -0,0 +1,48 @@
# Telemetry Core Bootstrap (v1 · 2025-11-19)
## Goal
Show minimal host wiring for `StellaOps.Telemetry.Core` with deterministic defaults and sealed-mode friendliness.
## Sample (web/worker host)
```csharp
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddStellaOpsTelemetry(
builder.Configuration,
serviceName: "StellaOps.SampleService",
serviceVersion: builder.Configuration["VERSION"],
configureOptions: options =>
{
// Disable collector in sealed mode / air-gap
options.Collector.Enabled = builder.Configuration.GetValue<bool>("Telemetry:Collector:Enabled", true);
options.Collector.Endpoint = builder.Configuration["Telemetry:Collector:Endpoint"];
options.Collector.Protocol = TelemetryCollectorProtocol.Grpc;
},
configureMetrics: m => m.AddAspNetCoreInstrumentation(),
configureTracing: t => t.AddHttpClientInstrumentation());
```
## Configuration (appsettings.json)
```json
{
"Telemetry": {
"Collector": {
"Enabled": true,
"Endpoint": "https://otel-collector.example:4317",
"Protocol": "Grpc",
"Component": "sample-service",
"Intent": "telemetry-export",
"DisableOnViolation": true
}
}
}
```
## Determinism & safety
- UTC timestamps only; no random IDs introduced by the helper.
- Exporter is skipped when endpoint missing or egress policy denies.
- `VSTEST_DISABLE_APPDOMAIN=1` recommended for tests with `tools/linksets-ci.sh` pattern.
## Next
- Propagation adapters (50-002) will build on this bootstrap.
- Scrub/analyzer policies live under upcoming 51-001/51-002 tasks.

View File

@@ -0,0 +1,43 @@
# Telemetry propagation contract (TELEMETRY-OBS-51-001)
**Goal**: standardise trace/metrics propagation across StellaOps services so golden-signal helpers remain deterministic, tenant-safe, and offline-friendly.
## Scope
- Applies to HTTP, gRPC, background jobs, and message handlers instrumented via `StellaOps.Telemetry.Core`.
- Complements bootstrap guide (`telemetry-bootstrap.md`) and precedes metrics helper implementation.
## Required context fields
- `trace_id` / `span_id`: W3C TraceContext headers only (no B3); generate if missing.
- `tenant`: lower-case string; required for all incoming requests; default to `unknown` only in sealed/offline diagnostics jobs.
- `actor`: optional user/service principal; redacted to hash in logs when `Scrub.Sealed=true`.
- `imposed_rule`: optional string conveying enforcement context (e.g., `merge=false`).
## HTTP middleware
- Accept `traceparent`/`tracestate`; reject/strip vendor-specific headers.
- Propagate `tenant`, `actor`, `imposed-rule` via `Stella-Tenant`, `Stella-Actor`, `Stella-Imposed-Rule` headers.
- Emit exemplars: when sampling is off, attach exemplar ids to request duration and active request metrics.
## gRPC interceptors
- Use binary TraceContext; carry metadata keys `stella-tenant`, `stella-actor`, `stella-imposed-rule`.
- Enforce presence of `tenant`; abort with `Unauthenticated` if missing in non-sealed mode.
## Jobs & message handlers
- Wrap background job execution with Activity + baggage items (`tenant`, `actor`, `imposed_rule`).
- When publishing bus events, stamp `trace_id` and `tenant` into headers; avoid embedding PII in payloads.
## Metrics helper expectations
- Golden signals: `http.server.duration`, `http.client.duration`, `messaging.operation.duration`, `job.execution.duration`, `runtime.gc.pause`, `db.call.duration`.
- Mandatory tags: `tenant`, `service`, `endpoint`/`operation`, `result` (`ok|error|cancelled|throttled`), `sealed` (`true|false`).
- Cardinality guard: drop/replace tag values exceeding 64 chars; cap path templates to first 3 segments.
## Determinism & offline posture
- All timestamps UTC RFC3339; sampling configs controlled via appsettings and mirrored in offline bundles.
- No external exporters when `Sealed=true`; use in-memory or file-based OTLP for air-gap.
## Tests to add with implementation
- Middleware unit tests asserting header/baggage mapping and tenant enforcement.
- Metrics helper tests ensuring required tags present and trimmed; exemplar id attached when enabled.
- Deterministic snapshot tests for serialized OTLP when sealed/offline.
## Provenance
- Authored 2025-11-20 to unblock TELEMETRY-OBS-51-001; to be refined as helpers are coded.

View File

@@ -0,0 +1,35 @@
# Telemetry scrubbing contract (TELEMETRY-OBS-51-002)
**Purpose**: define redaction/scrubbing rules for logs/traces/metrics before implementing helpers in `StellaOps.Telemetry.Core`.
## Redaction rules
- Strip or hash PII/credentials: emails, tokens, passwords, secrets, bearer/mTLS cert blobs.
- Default hash algorithm: SHA-256 hex; include `scrubbed=true` tag.
- Allowlist fields that remain: `tenant`, `trace_id`, `span_id`, `endpoint`, `result`, `sealed`.
## Configuration knobs
- `Telemetry:Scrub:Enabled` (bool, default true).
- `Telemetry:Scrub:Sealed` (bool, default false) — when true, force scrubbing and disable external exporters.
- `Telemetry:Scrub:HashSalt` (string, optional) — per-tenant salt; omit to keep deterministic hashes across deployments.
- `Telemetry:Scrub:MaxValueLength` (int, default 256) — truncate values beyond this length before hashing.
## Logger sink expectations
- Implement scrubber as `ILogPayloadFilter` injected before sink.
- Ensure message templates remain intact; only values scrubbed.
- Preserve structured shape so downstream parsing remains deterministic.
## Metrics & traces
- Never place raw user input into metric/tag values; pass through scrubber before export.
- Span events must omit payload bodies; include keyed references only.
## Auditing
- When scrubbing occurs, add tag `scrubbed=true` and `scrub_reason` (`pii|secret|length|pattern`).
- Provide counter `telemetry.scrub.events{tenant,reason}` for observability.
## Tests to add with implementation
- Unit tests for regex-based scrubbing of tokens, emails, URLs with creds.
- Config-driven tests toggling `Enabled`/`Sealed` modes to ensure exporters are suppressed when sealed.
- Determinism test: same input yields identical hashed output when salt unset.
## Provenance
- Authored 2025-11-20 to unblock TELEMETRY-OBS-51-002 and downstream 55/56 tasks.

View File

@@ -0,0 +1,33 @@
# Sealed-mode telemetry helpers (TELEMETRY-OBS-56-001 prep)
## Objective
Define behavior and configuration for telemetry when `Sealed=true`, ensuring no external egress while preserving deterministic local traces/metrics for audits.
## Requirements
- Disable external OTLP/exporters automatically when sealed; fallback to in-memory or file OTLP (`telemetry-sealed.otlp`) with bounded size (default 10 MB, ring buffer).
- Add tag `sealed=true` to all spans/metrics/logs; suppress exemplars.
- Force scrubbing: treat `Scrub.Sealed=true` regardless of default settings.
- Sampling: cap to 10% max in sealed mode unless CLI incident toggle raises it (see CLI-OBS-12-001 contract); ceiling 100% with explicit override `Telemetry:Sealed:MaxSamplingPercent`.
- Clock source: require monotonic clock for duration; emit warning if system clock skew detected >500ms.
## Configuration keys
- `Telemetry:Sealed:Enabled` (bool) — driven by host; when true activate sealed behavior.
- `Telemetry:Sealed:Exporter` (enum `memory|file`) — default `file`.
- `Telemetry:Sealed:FilePath` (string) — default `./logs/telemetry-sealed.otlp`.
- `Telemetry:Sealed:MaxBytes` (int) — default 10_485_760 (10 MB).
- `Telemetry:Sealed:MaxSamplingPercent` (int) — default 10.
- Derived flag `Telemetry:Sealed:EffectiveIncidentMode` (read-only) exposes if incident-mode override lifted sampling ceiling.
## File exporter format
- OTLP binary, append-only, deterministic ordering by enqueue time.
- Rotate when exceeding `MaxBytes` using suffix `.1`, `.2` capped to 3 files; oldest dropped.
- Permissions 0600 by default; fail-start if path is world-readable.
## Validation tests to implement with 56-001
- Unit: sealed mode forces exporter swap and tags `sealed=true`, `scrubbed=true`.
- Unit: sampling capped at max percent unless incident override set.
- Unit: file exporter rotates deterministically and enforces 0600 perms.
- Integration: sealed + incident mode together still block external exporters and honor scrub rules.
## Provenance
- Authored 2025-11-20 to satisfy PREP-TELEMETRY-OBS-56-001 and unblock implementation.