feat(zastava): add evidence locker plan and schema examples

- Introduced README.md for Zastava Evidence Locker Plan detailing artifacts to sign and post-signing steps. - Added example JSON schemas for observer events and webhook admissions. - Updated implementor guidelines with checklist for CI linting, determinism, secrets management, and schema control. - Created alert rules for Vuln Explorer to monitor API latency and projection errors. - Developed analytics ingestion plan for Vuln Explorer, focusing on telemetry and PII guardrails. - Implemented Grafana dashboard configuration for Vuln Explorer metrics visualization. - Added expected projection SHA256 for vulnerability events. - Created k6 load testing script for Vuln Explorer API. - Added sample projection and replay event data for testing. - Implemented ReplayInputsLock for deterministic replay inputs management. - Developed tests for ReplayInputsLock to ensure stable hash computation. - Created SurfaceManifestDeterminismVerifier to validate manifest determinism and integrity. - Added unit tests for SurfaceManifestDeterminismVerifier to ensure correct functionality. - Implemented Angular tests for VulnerabilityHttpClient and VulnerabilityDetailComponent to verify API interactions and UI rendering.
2025-12-02 09:27:31 +02:00
parent 885ce86af4
commit 2d08f52715
74 changed files with 1690 additions and 131 deletions
--- a/ops/devops/vuln/alerts.yaml
+++ b/ops/devops/vuln/alerts.yaml
@@ -0,0 +1,37 @@
+# Alert rules for Vuln Explorer (DEVOPS-VULN-29-002/003)
+apiVersion: 1
+groups:
+- name: vuln-explorer
+  rules:
+  - alert: vuln_api_latency_p95_gt_300ms
+    expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="vuln-explorer",path=~"/findings.*"}[5m])) > 0.3
+    for: 5m
+    labels:
+      severity: page
+    annotations:
+      summary: Vuln Explorer API p95 latency high
+      description: p95 latency for /findings exceeds 300ms for 5m.
+  - alert: vuln_projection_lag_gt_60s
+    expr: vuln_projection_lag_seconds > 60
+    for: 5m
+    labels:
+      severity: page
+    annotations:
+      summary: Vuln projection lag exceeds 60s
+      description: Ledger projector lag is above 60s.
+  - alert: vuln_projection_error_rate_gt_1pct
+    expr: rate(vuln_projection_errors_total[5m]) / rate(vuln_projection_runs_total[5m]) > 0.01
+    for: 5m
+    labels:
+      severity: page
+    annotations:
+      summary: Vuln projector error rate >1%
+      description: Projection errors exceed 1% over 5m.
+  - alert: vuln_query_budget_enforced_gt_50_per_min
+    expr: rate(vuln_query_budget_enforced_total[1m]) > 50
+    for: 5m
+    labels:
+      severity: warn
+    annotations:
+      summary: Query budget enforcement high
+      description: Budget enforcement is firing more than 50/min.
--- a/ops/devops/vuln/analytics-ingest-plan.md
+++ b/ops/devops/vuln/analytics-ingest-plan.md
@@ -0,0 +1,26 @@
+# Vuln Explorer analytics pipeline plan (DEVOPS-VULN-29-003)
+
+Goals: instrument analytics ingestion (query hashes, privacy/PII guardrails), update observability docs, and supply deployable configs.
+
+## Instrumentation tasks
+- Expose Prometheus counters/histograms in API:
+  - `vuln_query_hashes_total{tenant,query_hash}` increment on cached/served queries.
+  - `vuln_api_latency_seconds` histogram (already present; ensure labels avoid PII).
+  - `vuln_api_payload_bytes` histogram for request/response sizes.
+- Redact/avoid PII:
+  - Hash query bodies server-side (SHA256 with salt per deployment) before logging/metrics; store only hash+shape, not raw filters.
+  - Truncate any request field names/values in logs to 128 chars and drop known PII fields (email/userId).
+- Telemetry export:
+  - OTLP metrics/logs via existing collector profile; add `service=\"vuln-explorer\"` resource attrs.
+
+## Pipelines/configs
+- Grafana dashboard will read from Prometheus metrics already defined in `ops/devops/vuln/dashboards/vuln-explorer.json`.
+- Alert rules already in `ops/devops/vuln/alerts.yaml`; ensure additional rules for PII drops are not required (logs-only).
+
+## Docs
+- Update deploy docs (`deploy/README.md`) to mention PII-safe logging in Vuln Explorer and query-hash metrics.
+- Add runbook entry under `docs/modules/vuln-explorer/observability.md` (if absent, create) summarizing metrics and how to interpret query hashes.
+
+## CI checks
+- Unit test to assert logging middleware hashes queries and strips PII (to be implemented in API tests).
+- Add static check in pipeline ensuring `vuln_query_hashes_total` and payload histograms are scraped (Prometheus snapshot test).
--- a/ops/devops/vuln/dashboards/README.md
+++ b/ops/devops/vuln/dashboards/README.md
@@ -0,0 +1,4 @@
+# Vuln Explorer dashboards
+
+- `vuln-explorer.json`: p95 latency, projection lag, error rate, query budget enforcement.
+- Import into Grafana (folder `StellaOps / Vuln Explorer`). Data source: Prometheus scrape with `service="vuln-explorer"` labels.
--- a/ops/devops/vuln/dashboards/vuln-explorer.json
+++ b/ops/devops/vuln/dashboards/vuln-explorer.json
@@ -0,0 +1,30 @@
+{
+  "title": "Vuln Explorer",
+  "timezone": "utc",
+  "panels": [
+    {
+      "type": "timeseries",
+      "title": "API latency p50/p95/p99",
+      "targets": [
+        { "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service=\"vuln-explorer\",path=~\"/findings.*\"}[5m]))" },
+        { "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{service=\"vuln-explorer\",path=~\"/findings.*\"}[5m]))" }
+      ]
+    },
+    {
+      "type": "timeseries",
+      "title": "Projection lag (s)",
+      "targets": [ { "expr": "vuln_projection_lag_seconds" } ]
+    },
+    {
+      "type": "stat",
+      "title": "Error rate",
+      "targets": [ { "expr": "sum(rate(http_requests_total{service=\"vuln-explorer\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{service=\"vuln-explorer\"}[5m]))" } ],
+      "options": { "reduceOptions": { "calcs": ["lastNotNull"] } }
+    },
+    {
+      "type": "timeseries",
+      "title": "Query budget enforcement hits",
+      "targets": [ { "expr": "rate(vuln_query_budget_enforced_total[5m])" } ]
+    }
+  ]
+}
--- a/ops/devops/vuln/expected_projection.sha256
+++ b/ops/devops/vuln/expected_projection.sha256
@@ -0,0 +1 @@
+d89271fddb12115b3610b8cd476c85318cd56c44f7e019793c947bf57c8f86ef  samples/vuln/events/projection.json
--- a/ops/devops/vuln/k6-vuln-explorer.js
+++ b/ops/devops/vuln/k6-vuln-explorer.js
@@ -0,0 +1,47 @@
+import http from 'k6/http';
+import { check, sleep } from 'k6';
+import { Trend, Rate } from 'k6/metrics';
+
+const latency = new Trend('vuln_api_latency');
+const errors = new Rate('vuln_api_errors');
+
+const BASE = __ENV.VULN_BASE || 'http://localhost:8449';
+const TENANT = __ENV.VULN_TENANT || 'alpha';
+const TOKEN = __ENV.VULN_TOKEN || '';
+const HEADERS = TOKEN ? { 'Authorization': `Bearer ${TOKEN}`, 'X-StellaOps-Tenant': TENANT } : { 'X-StellaOps-Tenant': TENANT };
+
+export const options = {
+  scenarios: {
+    ramp: {
+      executor: 'ramping-vus',
+      startVUs: 0,
+      stages: [
+        { duration: '5m', target: 200 },
+        { duration: '10m', target: 200 },
+        { duration: '2m', target: 0 },
+      ],
+      gracefulRampDown: '30s',
+    },
+  },
+  thresholds: {
+    vuln_api_latency: ['p(95)<250'],
+    vuln_api_errors: ['rate<0.005'],
+  },
+};
+
+function req(path, params = {}) {
+  const res = http.get(`${BASE}${path}`, { headers: HEADERS, tags: params.tags });
+  latency.add(res.timings.duration, params.tags);
+  errors.add(res.status >= 400, params.tags);
+  check(res, {
+    'status is 2xx': (r) => r.status >= 200 && r.status < 300,
+  });
+  return res;
+}
+
+export default function () {
+  req(`/findings?tenant=${TENANT}&page=1&pageSize=50`, { tags: { endpoint: 'list' } });
+  req(`/findings?tenant=${TENANT}&status=open&page=1&pageSize=50`, { tags: { endpoint: 'filter_open' } });
+  req(`/findings/stats?tenant=${TENANT}`, { tags: { endpoint: 'stats' } });
+  sleep(1);
+}
--- a/ops/devops/vuln/vuln-explorer-ci-plan.md
+++ b/ops/devops/vuln/vuln-explorer-ci-plan.md
@@ -20,18 +20,17 @@ Assumptions: Vuln Explorer API uses MongoDB + Redis; ledger projector performs r
 - Alert when last anchored root age > 15m or mismatch detected.

 ## Verification Automation
- Script `ops/devops/vuln/verify_projection.sh` (to be added) should:
-  - Run projector against fixture events and compute hash of materialized view snapshot (`sha256sum` over canonical JSON export).
-  - Compare with expected hash stored in `ops/devops/vuln/expected_projection.sha256`.
-  - Exit non-zero on mismatch.
+- Script `ops/devops/vuln/verify_projection.sh` runs hash check:
+  - Input projection export (`samples/vuln/events/projection.json` default) compared to `ops/devops/vuln/expected_projection.sha256`.
+  - Exits non-zero on mismatch; use in CI after projector replay.

 ## Fixtures
 - Store deterministic replay fixture under `samples/vuln/events/replay.ndjson` (generated offline, includes mixed tenants, disputed findings, remediation states).
 - Export canonical projection snapshot to `samples/vuln/events/projection.json` and hash to `ops/devops/vuln/expected_projection.sha256`.

 ## Dashboards / Alerts (DEVOPS-VULN-29-002/003)
- Dashboard panels: projection lag, replay throughput, API latency (`/findings`, `/findings/{id}`), query budget enforcement hits, and Merkle anchoring status.
- Alerts: `vuln_projection_lag_gt_60s`, `vuln_projection_error_rate_gt_1pct`, `vuln_api_latency_p95_gt_300ms`, `merkle_anchor_stale_gt_15m`.
+- Dashboard JSON: `ops/devops/vuln/dashboards/vuln-explorer.json` (latency, projection lag, error rate, budget enforcement).
+- Alerts: `ops/devops/vuln/alerts.yaml` defining `vuln_api_latency_p95_gt_300ms`, `vuln_projection_lag_gt_60s`, `vuln_projection_error_rate_gt_1pct`, `vuln_query_budget_enforced_gt_50_per_min`.

 ## Offline posture
 - CI and verification use in-repo fixtures; no external downloads.
				`@@ -0,0 +1 @@`
				`d89271fddb12115b3610b8cd476c85318cd56c44f7e019793c947bf57c8f86ef samples/vuln/events/projection.json`