feat: Add initial implementation of Vulnerability Resolver Jobs
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Created project for StellaOps.Scanner.Analyzers.Native.Tests with necessary dependencies.
- Documented roles and guidelines in AGENTS.md for Scheduler module.
- Implemented IResolverJobService interface and InMemoryResolverJobService for handling resolver jobs.
- Added ResolverBacklogNotifier and ResolverBacklogService for monitoring job metrics.
- Developed API endpoints for managing resolver jobs and retrieving metrics.
- Defined models for resolver job requests and responses.
- Integrated dependency injection for resolver job services.
- Implemented ImpactIndexSnapshot for persisting impact index data.
- Introduced SignalsScoringOptions for configurable scoring weights in reachability scoring.
- Added unit tests for ReachabilityScoringService and RuntimeFactsIngestionService.
- Created dotnet-filter.sh script to handle command-line arguments for dotnet.
- Established nuget-prime project for managing package downloads.
This commit is contained in:
master
2025-11-18 07:52:15 +02:00
parent e69b57d467
commit 8355e2ff75
299 changed files with 13293 additions and 2444 deletions

View File

@@ -17,6 +17,8 @@
| `ledger_ingest_backlog_events` | Gauge | `tenant` | Number of events buffered in the writer queue. Alert when >5000 for 5min. |
| `ledger_projection_lag_seconds` | Gauge | `tenant` | Wall-clock difference between latest ledger event and projection tail. Target <30s. |
| `ledger_projection_rebuild_seconds` | Histogram | `tenant` | Duration of replay/rebuild operations triggered by LEDGER-29-008 harness. |
| `ledger_projection_apply_seconds` | Histogram | `tenant`, `event_type`, `policy_version`, `evaluation_status` | Time to apply a single ledger event to projection. Target P95 <1s. |
| `ledger_projection_events_total` | Counter | `tenant`, `event_type`, `policy_version`, `evaluation_status` | Count of events applied to projections. |
| `ledger_merkle_anchor_duration_seconds` | Histogram | `tenant` | Time to batch + anchor events. Target <60s per 10k events. |
| `ledger_merkle_anchor_failures_total` | Counter | `tenant`, `reason` (`db`, `signing`, `network`) | Alerts at >0 within 15min. |
| `ledger_attachments_encryption_failures_total` | Counter | `tenant`, `stage` (`encrypt`, `sign`, `upload`) | Ensures secure attachment pipeline stays healthy. |
@@ -25,22 +27,23 @@
### Derived dashboards
- **Writer health:** `ledger_write_latency_seconds` (P50/P95/P99), backlog gauge, event throughput.
- **Projection health:** `ledger_projection_lag_seconds`, rebuild durations, conflict counts (from logs).
- **Projection health:** `ledger_projection_lag_seconds`, `ledger_projection_apply_seconds`, projection throughput, conflict counts (from logs).
- **Anchoring:** Anchor duration histogram, failure counter, root hash timeline.
## 3. Logs & traces
- **Log structure:** Serilog JSON with fields `tenant`, `chainId`, `sequence`, `eventId`, `eventType`, `actorId`, `policyVersion`, `hash`, `merkleRoot`.
- **Log levels:** `Information` for success summaries (sampled), `Warning` for retried operations, `Error` for failed writes/anchors.
- **Correlation:** Each API request includes `requestId` + `traceId` logged with events. Projector logs capture `replayId` and `rebuildReason`.
- **Timeline events:** `ledger.event.appended` and `ledger.projection.updated` are emitted as structured logs carrying `tenant`, `chainId`, `sequence`, `eventId`, `policyVersion`, `traceId`, and placeholder `evidence_ref` fields for downstream timeline consumers.
- **Secrets:** Ensure `event_body` is never logged; log only metadata/hashes.
## 4. Alerts
| Alert | Condition | Response |
| --- | --- | --- |
| **LedgerWriteSLA** | `ledger_write_latency_seconds` P95 > 0.12s for 3 intervals | Check DB contention, review queue backlog, scale writer. |
| **LedgerWriteSLA** | `ledger_write_latency_seconds` P95 > 1s for 3 intervals | Check DB contention, review queue backlog, scale writer. |
| **LedgerBacklogGrowing** | `ledger_ingest_backlog_events` > 5000 for 5min | Inspect upstream policy runs, ensure projector keeping up. |
| **ProjectionLag** | `ledger_projection_lag_seconds` > 60s | Trigger rebuild, verify change streams. |
| **ProjectionLag** | `ledger_projection_lag_seconds` > 30s | Trigger rebuild, verify change streams. |
| **AnchorFailure** | `ledger_merkle_anchor_failures_total` increase > 0 | Collect logs, rerun anchor, verify signing service. |
| **AttachmentSecurityError** | `ledger_attachments_encryption_failures_total` increase > 0 | Audit attachments pipeline; check key material and storage endpoints. |