CD/CD consolidation

This commit is contained in:
StellaOps Bot
2025-12-26 17:32:23 +02:00
parent a866eb6277
commit c786faae84
638 changed files with 3821 additions and 181 deletions

37
devops/vuln/alerts.yaml Normal file
View File

@@ -0,0 +1,37 @@
# Alert rules for Vuln Explorer (DEVOPS-VULN-29-002/003)
apiVersion: 1
groups:
- name: vuln-explorer
rules:
- alert: vuln_api_latency_p95_gt_300ms
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="vuln-explorer",path=~"/findings.*"}[5m])) > 0.3
for: 5m
labels:
severity: page
annotations:
summary: Vuln Explorer API p95 latency high
description: p95 latency for /findings exceeds 300ms for 5m.
- alert: vuln_projection_lag_gt_60s
expr: vuln_projection_lag_seconds > 60
for: 5m
labels:
severity: page
annotations:
summary: Vuln projection lag exceeds 60s
description: Ledger projector lag is above 60s.
- alert: vuln_projection_error_rate_gt_1pct
expr: rate(vuln_projection_errors_total[5m]) / rate(vuln_projection_runs_total[5m]) > 0.01
for: 5m
labels:
severity: page
annotations:
summary: Vuln projector error rate >1%
description: Projection errors exceed 1% over 5m.
- alert: vuln_query_budget_enforced_gt_50_per_min
expr: rate(vuln_query_budget_enforced_total[1m]) > 50
for: 5m
labels:
severity: warn
annotations:
summary: Query budget enforcement high
description: Budget enforcement is firing more than 50/min.

View File

@@ -0,0 +1,26 @@
# Vuln Explorer analytics pipeline plan (DEVOPS-VULN-29-003)
Goals: instrument analytics ingestion (query hashes, privacy/PII guardrails), update observability docs, and supply deployable configs.
## Instrumentation tasks
- Expose Prometheus counters/histograms in API:
- `vuln_query_hashes_total{tenant,query_hash}` increment on cached/served queries.
- `vuln_api_latency_seconds` histogram (already present; ensure labels avoid PII).
- `vuln_api_payload_bytes` histogram for request/response sizes.
- Redact/avoid PII:
- Hash query bodies server-side (SHA256 with salt per deployment) before logging/metrics; store only hash+shape, not raw filters.
- Truncate any request field names/values in logs to 128 chars and drop known PII fields (email/userId).
- Telemetry export:
- OTLP metrics/logs via existing collector profile; add `service=\"vuln-explorer\"` resource attrs.
## Pipelines/configs
- Grafana dashboard will read from Prometheus metrics already defined in `ops/devops/vuln/dashboards/vuln-explorer.json`.
- Alert rules already in `ops/devops/vuln/alerts.yaml`; ensure additional rules for PII drops are not required (logs-only).
## Docs
- Update deploy docs (`deploy/README.md`) to mention PII-safe logging in Vuln Explorer and query-hash metrics.
- Add runbook entry under `docs/modules/vuln-explorer/observability.md` (if absent, create) summarizing metrics and how to interpret query hashes.
## CI checks
- Unit test to assert logging middleware hashes queries and strips PII (to be implemented in API tests).
- Add static check in pipeline ensuring `vuln_query_hashes_total` and payload histograms are scraped (Prometheus snapshot test).

View File

@@ -0,0 +1,4 @@
# Vuln Explorer dashboards
- `vuln-explorer.json`: p95 latency, projection lag, error rate, query budget enforcement.
- Import into Grafana (folder `StellaOps / Vuln Explorer`). Data source: Prometheus scrape with `service="vuln-explorer"` labels.

View File

@@ -0,0 +1,30 @@
{
"title": "Vuln Explorer",
"timezone": "utc",
"panels": [
{
"type": "timeseries",
"title": "API latency p50/p95/p99",
"targets": [
{ "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service=\"vuln-explorer\",path=~\"/findings.*\"}[5m]))" },
{ "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{service=\"vuln-explorer\",path=~\"/findings.*\"}[5m]))" }
]
},
{
"type": "timeseries",
"title": "Projection lag (s)",
"targets": [ { "expr": "vuln_projection_lag_seconds" } ]
},
{
"type": "stat",
"title": "Error rate",
"targets": [ { "expr": "sum(rate(http_requests_total{service=\"vuln-explorer\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{service=\"vuln-explorer\"}[5m]))" } ],
"options": { "reduceOptions": { "calcs": ["lastNotNull"] } }
},
{
"type": "timeseries",
"title": "Query budget enforcement hits",
"targets": [ { "expr": "rate(vuln_query_budget_enforced_total[5m])" } ]
}
]
}

View File

@@ -0,0 +1 @@
d89271fddb12115b3610b8cd476c85318cd56c44f7e019793c947bf57c8f86ef samples/vuln/events/projection.json

View File

@@ -0,0 +1,47 @@
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate } from 'k6/metrics';
const latency = new Trend('vuln_api_latency');
const errors = new Rate('vuln_api_errors');
const BASE = __ENV.VULN_BASE || 'http://localhost:8449';
const TENANT = __ENV.VULN_TENANT || 'alpha';
const TOKEN = __ENV.VULN_TOKEN || '';
const HEADERS = TOKEN ? { 'Authorization': `Bearer ${TOKEN}`, 'X-StellaOps-Tenant': TENANT } : { 'X-StellaOps-Tenant': TENANT };
export const options = {
scenarios: {
ramp: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '5m', target: 200 },
{ duration: '10m', target: 200 },
{ duration: '2m', target: 0 },
],
gracefulRampDown: '30s',
},
},
thresholds: {
vuln_api_latency: ['p(95)<250'],
vuln_api_errors: ['rate<0.005'],
},
};
function req(path, params = {}) {
const res = http.get(`${BASE}${path}`, { headers: HEADERS, tags: params.tags });
latency.add(res.timings.duration, params.tags);
errors.add(res.status >= 400, params.tags);
check(res, {
'status is 2xx': (r) => r.status >= 200 && r.status < 300,
});
return res;
}
export default function () {
req(`/findings?tenant=${TENANT}&page=1&pageSize=50`, { tags: { endpoint: 'list' } });
req(`/findings?tenant=${TENANT}&status=open&page=1&pageSize=50`, { tags: { endpoint: 'filter_open' } });
req(`/findings/stats?tenant=${TENANT}`, { tags: { endpoint: 'stats' } });
sleep(1);
}

View File

@@ -0,0 +1,22 @@
# Vuln Explorer query-hash metrics spec (DEVOPS-VULN-29-003)
## Metrics to emit
- `vuln_query_hashes_total{tenant,query_hash,route,cache="hit|miss"}`
- `vuln_api_payload_bytes_bucket{direction="request|response"}`
## Hashing rules
- Hash canonicalised query body (sorted keys, trimmed whitespace) with SHA-256.
- Salt: deployment-specific (e.g., `Telemetry:QueryHashSalt`), 32 bytes hex.
- Store only hash; never log raw filters.
- Truncate any string field >128 chars before hashing to control cardinality.
## Logging filter
- Drop fields named `email`, `userId`, `principalName`; replace with `[redacted]` before metrics/logging.
- Retain `tenant`, `route`, `status`, `durationMs`, `query_hash`.
## Prometheus exemplar tags (optional)
- Add `trace_id` as exemplar if traces enabled; do not add request bodies.
## Acceptance checks
- Unit test: hashed query string changes when salt changes; raw query not present in logs.
- Prometheus snapshot test: scrape and assert presence of `vuln_query_hashes_total` and payload histograms.

View File

@@ -0,0 +1,25 @@
#!/usr/bin/env bash
# Deterministic projection verification for DEVOPS-VULN-29-001/002
# Usage: ./verify_projection.sh [projection-export.json] [expected-hash-file]
set -euo pipefail
PROJECTION=${1:-samples/vuln/events/projection.json}
EXPECTED_HASH_FILE=${2:-ops/devops/vuln/expected_projection.sha256}
if [[ ! -f "$PROJECTION" ]]; then
echo "projection file not found: $PROJECTION" >&2
exit 1
fi
if [[ ! -f "$EXPECTED_HASH_FILE" ]]; then
echo "expected hash file not found: $EXPECTED_HASH_FILE" >&2
exit 1
fi
calc_hash=$(sha256sum "$PROJECTION" | awk '{print $1}')
expected_hash=$(cut -d' ' -f1 "$EXPECTED_HASH_FILE")
if [[ "$calc_hash" != "$expected_hash" ]]; then
echo "mismatch: projection hash $calc_hash expected $expected_hash" >&2
exit 2
fi
echo "projection hash matches ($calc_hash)" >&2

View File

@@ -0,0 +1,42 @@
# Vuln Explorer CI + Ops Plan (DEVOPS-VULN-29-001)
Scope: CI jobs, backup/DR, Merkle anchoring monitoring, and verification automation for the Vuln Explorer ledger projector and API.
Assumptions: Vuln Explorer API uses MongoDB + Redis; ledger projector performs replay into materialized views; Merkle tree anchoring to transparency log.
## CI Jobs
- `build-vuln`: dotnet restore/build for `src/VulnExplorer/StellaOps.VulnExplorer.Api` and projector; use `DOTNET_DISABLE_BUILTIN_GRAPH=1` and `local-nugets/`.
- `test-vuln`: focused tests with `dotnet test src/VulnExplorer/__Tests/...` and `--filter Category!=GraphHeavy`; publish TRX + coverage.
- `replay-smoke`: run projector against fixture event log (`samples/vuln/events/replay.ndjson`) and assert deterministic materialized view hash; fail on divergence.
- `sbom+attest`: reuse `ops/devops/docker/sbom_attest.sh` post-build.
## Backup & DR
- Mongo: enable point-in-time snapshots (if available) or nightly `mongodump` of `vuln_explorer` db; store in object storage with retention 30d.
- Redis (if used for cache): not authoritative; no backup required.
- Replay-first recovery: keep latest event log snapshot in `release artifacts`; replay task rehydrates materialized views.
## Merkle Anchoring Verification
- Monitor projector metrics: `ledger_projection_lag_seconds`, `ledger_projection_errors_total`.
- Add periodic job `verify-merkle`: fetch latest Merkle root from projector state, cross-check against transparency log (`rekor` or configured log) using `cosign verify-tree` or custom verifier.
- Alert when last anchored root age > 15m or mismatch detected.
## Verification Automation
- Script `ops/devops/vuln/verify_projection.sh` runs hash check:
- Input projection export (`samples/vuln/events/projection.json` default) compared to `ops/devops/vuln/expected_projection.sha256`.
- Exits non-zero on mismatch; use in CI after projector replay.
## Fixtures
- Store deterministic replay fixture under `samples/vuln/events/replay.ndjson` (generated offline, includes mixed tenants, disputed findings, remediation states).
- Export canonical projection snapshot to `samples/vuln/events/projection.json` and hash to `ops/devops/vuln/expected_projection.sha256`.
## Dashboards / Alerts (DEVOPS-VULN-29-002/003)
- Dashboard JSON: `ops/devops/vuln/dashboards/vuln-explorer.json` (latency, projection lag, error rate, budget enforcement).
- Alerts: `ops/devops/vuln/alerts.yaml` defining `vuln_api_latency_p95_gt_300ms`, `vuln_projection_lag_gt_60s`, `vuln_projection_error_rate_gt_1pct`, `vuln_query_budget_enforced_gt_50_per_min`.
## Offline posture
- CI and verification use in-repo fixtures; no external downloads.
- Use mirrored images and `local-nugets/` for all builds/tests.
## Local run
```
DOTNET_DISABLE_BUILTIN_GRAPH=1 dotnet test src/VulnExplorer/__Tests/StellaOps.VulnExplorer.Api.Tests/StellaOps.VulnExplorer.Api.Tests.csproj --filter Category!=GraphHeavy
```