Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.
This commit is contained in:
29
docs/observability/cli-incident-toggle-12-001.md
Normal file
29
docs/observability/cli-incident-toggle-12-001.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# CLI incident toggle contract (CLI-OBS-12-001)
|
||||
|
||||
**Goal**: define a deterministic CLI flag and config surface to enter/exit incident mode, required by TELEMETRY-OBS-55-001/56-001.
|
||||
|
||||
## Flags and config
|
||||
- CLI flag: `--incident-mode` (bool). Defaults to false.
|
||||
- Config key: `Telemetry:Incident:Enabled` (bool) and `Telemetry:Incident:TTL` (TimeSpan).
|
||||
- When both flag and config specified, flag wins (opt-in only; cannot disable if config enables and flag present).
|
||||
|
||||
## Effects when enabled
|
||||
- Increase sampling rate ceiling to 100% for telemetry within the process.
|
||||
- Add tag `incident=true` to logs/metrics/traces.
|
||||
- Shorten exporter/reporting flush interval to 5s; disable external exporters when `Sealed=true`.
|
||||
- Emit activation audit event `telemetry.incident.activated` with fields `{tenant, actor, source, expires_at}`.
|
||||
|
||||
## Persistence
|
||||
- Incident flag runtime value stored in local state file `~/.stellaops/incident-mode.json` with fields `{enabled, set_at, expires_at, actor}` for offline continuity.
|
||||
- File is tenant-scoped; permissions 0600.
|
||||
|
||||
## Expiry / TTL
|
||||
- Default TTL: 30 minutes unless `Telemetry:Incident:TTL` provided.
|
||||
- On expiry, emit `telemetry.incident.expired` audit event.
|
||||
|
||||
## Validation expectations
|
||||
- CLI should refuse `--incident-mode` if `--sealed` is set and external exporters are configured (must drop exporters first).
|
||||
- Unit tests to cover precedence (flag over config), TTL expiry, state file perms, and audit emissions.
|
||||
|
||||
## Provenance
|
||||
- Authored 2025-11-20 to unblock PREP-CLI-OBS-12-001 and TELEMETRY-OBS-55-001.
|
||||
48
docs/observability/telemetry-bootstrap.md
Normal file
48
docs/observability/telemetry-bootstrap.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Telemetry Core Bootstrap (v1 · 2025-11-19)
|
||||
|
||||
## Goal
|
||||
Show minimal host wiring for `StellaOps.Telemetry.Core` with deterministic defaults and sealed-mode friendliness.
|
||||
|
||||
## Sample (web/worker host)
|
||||
```csharp
|
||||
var builder = WebApplication.CreateBuilder(args);
|
||||
|
||||
builder.Services.AddStellaOpsTelemetry(
|
||||
builder.Configuration,
|
||||
serviceName: "StellaOps.SampleService",
|
||||
serviceVersion: builder.Configuration["VERSION"],
|
||||
configureOptions: options =>
|
||||
{
|
||||
// Disable collector in sealed mode / air-gap
|
||||
options.Collector.Enabled = builder.Configuration.GetValue<bool>("Telemetry:Collector:Enabled", true);
|
||||
options.Collector.Endpoint = builder.Configuration["Telemetry:Collector:Endpoint"];
|
||||
options.Collector.Protocol = TelemetryCollectorProtocol.Grpc;
|
||||
},
|
||||
configureMetrics: m => m.AddAspNetCoreInstrumentation(),
|
||||
configureTracing: t => t.AddHttpClientInstrumentation());
|
||||
```
|
||||
|
||||
## Configuration (appsettings.json)
|
||||
```json
|
||||
{
|
||||
"Telemetry": {
|
||||
"Collector": {
|
||||
"Enabled": true,
|
||||
"Endpoint": "https://otel-collector.example:4317",
|
||||
"Protocol": "Grpc",
|
||||
"Component": "sample-service",
|
||||
"Intent": "telemetry-export",
|
||||
"DisableOnViolation": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Determinism & safety
|
||||
- UTC timestamps only; no random IDs introduced by the helper.
|
||||
- Exporter is skipped when endpoint missing or egress policy denies.
|
||||
- `VSTEST_DISABLE_APPDOMAIN=1` recommended for tests with `tools/linksets-ci.sh` pattern.
|
||||
|
||||
## Next
|
||||
- Propagation adapters (50-002) will build on this bootstrap.
|
||||
- Scrub/analyzer policies live under upcoming 51-001/51-002 tasks.
|
||||
43
docs/observability/telemetry-propagation-51-001.md
Normal file
43
docs/observability/telemetry-propagation-51-001.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Telemetry propagation contract (TELEMETRY-OBS-51-001)
|
||||
|
||||
**Goal**: standardise trace/metrics propagation across StellaOps services so golden-signal helpers remain deterministic, tenant-safe, and offline-friendly.
|
||||
|
||||
## Scope
|
||||
- Applies to HTTP, gRPC, background jobs, and message handlers instrumented via `StellaOps.Telemetry.Core`.
|
||||
- Complements bootstrap guide (`telemetry-bootstrap.md`) and precedes metrics helper implementation.
|
||||
|
||||
## Required context fields
|
||||
- `trace_id` / `span_id`: W3C TraceContext headers only (no B3); generate if missing.
|
||||
- `tenant`: lower-case string; required for all incoming requests; default to `unknown` only in sealed/offline diagnostics jobs.
|
||||
- `actor`: optional user/service principal; redacted to hash in logs when `Scrub.Sealed=true`.
|
||||
- `imposed_rule`: optional string conveying enforcement context (e.g., `merge=false`).
|
||||
|
||||
## HTTP middleware
|
||||
- Accept `traceparent`/`tracestate`; reject/strip vendor-specific headers.
|
||||
- Propagate `tenant`, `actor`, `imposed-rule` via `Stella-Tenant`, `Stella-Actor`, `Stella-Imposed-Rule` headers.
|
||||
- Emit exemplars: when sampling is off, attach exemplar ids to request duration and active request metrics.
|
||||
|
||||
## gRPC interceptors
|
||||
- Use binary TraceContext; carry metadata keys `stella-tenant`, `stella-actor`, `stella-imposed-rule`.
|
||||
- Enforce presence of `tenant`; abort with `Unauthenticated` if missing in non-sealed mode.
|
||||
|
||||
## Jobs & message handlers
|
||||
- Wrap background job execution with Activity + baggage items (`tenant`, `actor`, `imposed_rule`).
|
||||
- When publishing bus events, stamp `trace_id` and `tenant` into headers; avoid embedding PII in payloads.
|
||||
|
||||
## Metrics helper expectations
|
||||
- Golden signals: `http.server.duration`, `http.client.duration`, `messaging.operation.duration`, `job.execution.duration`, `runtime.gc.pause`, `db.call.duration`.
|
||||
- Mandatory tags: `tenant`, `service`, `endpoint`/`operation`, `result` (`ok|error|cancelled|throttled`), `sealed` (`true|false`).
|
||||
- Cardinality guard: drop/replace tag values exceeding 64 chars; cap path templates to first 3 segments.
|
||||
|
||||
## Determinism & offline posture
|
||||
- All timestamps UTC RFC3339; sampling configs controlled via appsettings and mirrored in offline bundles.
|
||||
- No external exporters when `Sealed=true`; use in-memory or file-based OTLP for air-gap.
|
||||
|
||||
## Tests to add with implementation
|
||||
- Middleware unit tests asserting header/baggage mapping and tenant enforcement.
|
||||
- Metrics helper tests ensuring required tags present and trimmed; exemplar id attached when enabled.
|
||||
- Deterministic snapshot tests for serialized OTLP when sealed/offline.
|
||||
|
||||
## Provenance
|
||||
- Authored 2025-11-20 to unblock TELEMETRY-OBS-51-001; to be refined as helpers are coded.
|
||||
35
docs/observability/telemetry-scrub-51-002.md
Normal file
35
docs/observability/telemetry-scrub-51-002.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Telemetry scrubbing contract (TELEMETRY-OBS-51-002)
|
||||
|
||||
**Purpose**: define redaction/scrubbing rules for logs/traces/metrics before implementing helpers in `StellaOps.Telemetry.Core`.
|
||||
|
||||
## Redaction rules
|
||||
- Strip or hash PII/credentials: emails, tokens, passwords, secrets, bearer/mTLS cert blobs.
|
||||
- Default hash algorithm: SHA-256 hex; include `scrubbed=true` tag.
|
||||
- Allowlist fields that remain: `tenant`, `trace_id`, `span_id`, `endpoint`, `result`, `sealed`.
|
||||
|
||||
## Configuration knobs
|
||||
- `Telemetry:Scrub:Enabled` (bool, default true).
|
||||
- `Telemetry:Scrub:Sealed` (bool, default false) — when true, force scrubbing and disable external exporters.
|
||||
- `Telemetry:Scrub:HashSalt` (string, optional) — per-tenant salt; omit to keep deterministic hashes across deployments.
|
||||
- `Telemetry:Scrub:MaxValueLength` (int, default 256) — truncate values beyond this length before hashing.
|
||||
|
||||
## Logger sink expectations
|
||||
- Implement scrubber as `ILogPayloadFilter` injected before sink.
|
||||
- Ensure message templates remain intact; only values scrubbed.
|
||||
- Preserve structured shape so downstream parsing remains deterministic.
|
||||
|
||||
## Metrics & traces
|
||||
- Never place raw user input into metric/tag values; pass through scrubber before export.
|
||||
- Span events must omit payload bodies; include keyed references only.
|
||||
|
||||
## Auditing
|
||||
- When scrubbing occurs, add tag `scrubbed=true` and `scrub_reason` (`pii|secret|length|pattern`).
|
||||
- Provide counter `telemetry.scrub.events{tenant,reason}` for observability.
|
||||
|
||||
## Tests to add with implementation
|
||||
- Unit tests for regex-based scrubbing of tokens, emails, URLs with creds.
|
||||
- Config-driven tests toggling `Enabled`/`Sealed` modes to ensure exporters are suppressed when sealed.
|
||||
- Determinism test: same input yields identical hashed output when salt unset.
|
||||
|
||||
## Provenance
|
||||
- Authored 2025-11-20 to unblock TELEMETRY-OBS-51-002 and downstream 55/56 tasks.
|
||||
33
docs/observability/telemetry-sealed-56-001.md
Normal file
33
docs/observability/telemetry-sealed-56-001.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Sealed-mode telemetry helpers (TELEMETRY-OBS-56-001 prep)
|
||||
|
||||
## Objective
|
||||
Define behavior and configuration for telemetry when `Sealed=true`, ensuring no external egress while preserving deterministic local traces/metrics for audits.
|
||||
|
||||
## Requirements
|
||||
- Disable external OTLP/exporters automatically when sealed; fallback to in-memory or file OTLP (`telemetry-sealed.otlp`) with bounded size (default 10 MB, ring buffer).
|
||||
- Add tag `sealed=true` to all spans/metrics/logs; suppress exemplars.
|
||||
- Force scrubbing: treat `Scrub.Sealed=true` regardless of default settings.
|
||||
- Sampling: cap to 10% max in sealed mode unless CLI incident toggle raises it (see CLI-OBS-12-001 contract); ceiling 100% with explicit override `Telemetry:Sealed:MaxSamplingPercent`.
|
||||
- Clock source: require monotonic clock for duration; emit warning if system clock skew detected >500ms.
|
||||
|
||||
## Configuration keys
|
||||
- `Telemetry:Sealed:Enabled` (bool) — driven by host; when true activate sealed behavior.
|
||||
- `Telemetry:Sealed:Exporter` (enum `memory|file`) — default `file`.
|
||||
- `Telemetry:Sealed:FilePath` (string) — default `./logs/telemetry-sealed.otlp`.
|
||||
- `Telemetry:Sealed:MaxBytes` (int) — default 10_485_760 (10 MB).
|
||||
- `Telemetry:Sealed:MaxSamplingPercent` (int) — default 10.
|
||||
- Derived flag `Telemetry:Sealed:EffectiveIncidentMode` (read-only) exposes if incident-mode override lifted sampling ceiling.
|
||||
|
||||
## File exporter format
|
||||
- OTLP binary, append-only, deterministic ordering by enqueue time.
|
||||
- Rotate when exceeding `MaxBytes` using suffix `.1`, `.2` capped to 3 files; oldest dropped.
|
||||
- Permissions 0600 by default; fail-start if path is world-readable.
|
||||
|
||||
## Validation tests to implement with 56-001
|
||||
- Unit: sealed mode forces exporter swap and tags `sealed=true`, `scrubbed=true`.
|
||||
- Unit: sampling capped at max percent unless incident override set.
|
||||
- Unit: file exporter rotates deterministically and enforces 0600 perms.
|
||||
- Integration: sealed + incident mode together still block external exporters and honor scrub rules.
|
||||
|
||||
## Provenance
|
||||
- Authored 2025-11-20 to satisfy PREP-TELEMETRY-OBS-56-001 and unblock implementation.
|
||||
Reference in New Issue
Block a user