Refactor code structure for improved readability and maintainability

This commit is contained in:
StellaOps Bot
2025-12-06 10:23:40 +02:00
parent 6beb9d7c4e
commit 37304cf819
78 changed files with 5471 additions and 104 deletions

View File

@@ -27,11 +27,28 @@
| `ledger_attachments_encryption_failures_total` | Counter | `tenant`, `stage` (`encrypt`, `sign`, `upload`) | Ensures secure attachment pipeline stays healthy. |
| `ledger_db_connections_active` | Gauge | `role` (`writer`, `projector`) | Helps tune pool size. |
| `ledger_app_version_info` | Gauge | `version`, `git_sha` | Static metric for fleet observability. |
| `ledger_scoring_latency_seconds` | Histogram | `tenant`, `policy_version`, `result` | Latency of risk scoring operations per finding. P95 target <500 ms. |
| `ledger_scoring_operations_total` | Counter | `tenant`, `policy_version`, `result` | Total number of scoring operations by result (success, partial_success, error, etc.). |
| `ledger_scoring_provider_gaps_total` | Counter | `tenant`, `provider`, `reason` | Count of findings where scoring provider was unavailable or returned no data. |
| `ledger_severity_distribution_critical` | Gauge | `tenant`, `policy_version` | Current count of critical severity findings by tenant and policy. |
| `ledger_severity_distribution_high` | Gauge | `tenant`, `policy_version` | Current count of high severity findings by tenant and policy. |
| `ledger_severity_distribution_medium` | Gauge | `tenant`, `policy_version` | Current count of medium severity findings by tenant and policy. |
| `ledger_severity_distribution_low` | Gauge | `tenant`, `policy_version` | Current count of low severity findings by tenant and policy. |
| `ledger_severity_distribution_unknown` | Gauge | `tenant`, `policy_version` | Current count of unknown/unscored findings by tenant and policy. |
| `ledger_score_freshness_seconds` | Gauge | `tenant` | Time since last scoring operation completed by tenant. Alert when >3600 s. |
| `ledger_scored_findings_exports_total` | Counter | `tenant`, `record_count` | Count of scored findings export operations. |
| `ledger_scored_findings_export_duration_seconds` | Histogram | `tenant`, `record_count` | Duration of scored findings export operations. |
| `ledger_airgap_staleness_seconds` | Histogram | `domain` | Current staleness of air-gap imported data by domain. |
| `ledger_airgap_staleness_gauge_seconds` | Gauge | `domain` | Current staleness of air-gap data by domain (observable gauge). |
| `ledger_staleness_validation_failures_total` | Counter | `domain` | Count of staleness validation failures blocking exports. |
### Derived dashboards
- **Writer health:** `ledger_write_latency_seconds` (P50/P95/P99), backlog gauge, event throughput.
- **Projection health:** `ledger_projection_lag_seconds`, `ledger_projection_apply_seconds`, projection throughput, conflict counts (from logs).
- **Anchoring:** Anchor duration histogram, failure counter, root hash timeline.
- **Risk scoring:** `ledger_scoring_latency_seconds` (P50/P95/P99), severity distribution gauges, provider gap counter, score freshness.
- **Export operations:** `ledger_scored_findings_exports_total`, export duration histogram, record counts.
- **Air-gap health:** `ledger_airgap_staleness_gauge_seconds`, staleness validation failures, domain freshness trends.
## 3. Logs & traces
- **Log structure:** Serilog JSON with fields `tenant`, `chainId`, `sequence`, `eventId`, `eventType`, `actorId`, `policyVersion`, `hash`, `merkleRoot`.
@@ -50,6 +67,9 @@
| **ProjectionLag** | `ledger_projection_lag_seconds` > 30s | Trigger rebuild, verify change streams. |
| **AnchorFailure** | `ledger_merkle_anchor_failures_total` increase > 0 | Collect logs, rerun anchor, verify signing service. |
| **AttachmentSecurityError** | `ledger_attachments_encryption_failures_total` increase > 0 | Audit attachments pipeline; check key material and storage endpoints. |
| **ScoringFreshnessStale** | `ledger_score_freshness_seconds` > 3600 s for any tenant | Check scoring pipeline, verify provider connectivity, re-trigger scoring job. |
| **ScoringProviderGaps** | `ledger_scoring_provider_gaps_total` increase > 10 in 5 min | Investigate provider failures; check rate limits or connectivity. |
| **AirgapDataStale** | `ledger_airgap_staleness_gauge_seconds` > threshold for 15 min | Re-import air-gap bundle; verify export pipeline in source enclave. |
Alerts integrate with Notifier channel `ledger.alerts`. For air-gapped deployments emit to local syslog + CLI incident scripts.