Add call graph fixtures for various languages and scenarios
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Reachability Corpus Validation / validate-corpus (push) Has been cancelled
Reachability Corpus Validation / validate-ground-truths (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Reachability Corpus Validation / determinism-check (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Reachability Corpus Validation / validate-corpus (push) Has been cancelled
Reachability Corpus Validation / validate-ground-truths (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Reachability Corpus Validation / determinism-check (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
- Introduced `all-edge-reasons.json` to test edge resolution reasons in .NET. - Added `all-visibility-levels.json` to validate method visibility levels in .NET. - Created `dotnet-aspnetcore-minimal.json` for a minimal ASP.NET Core application. - Included `go-gin-api.json` for a Go Gin API application structure. - Added `java-spring-boot.json` for the Spring PetClinic application in Java. - Introduced `legacy-no-schema.json` for legacy application structure without schema. - Created `node-express-api.json` for an Express.js API application structure.
This commit is contained in:
76
docs/observability/dashboards/offline-kit-operations.json
Normal file
76
docs/observability/dashboards/offline-kit-operations.json
Normal file
@@ -0,0 +1,76 @@
|
||||
{
|
||||
"schemaVersion": 39,
|
||||
"title": "Offline Kit Operations",
|
||||
"panels": [
|
||||
{
|
||||
"type": "timeseries",
|
||||
"title": "Offline Kit imports by status (rate)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "ops", "decimals": 3 } },
|
||||
"targets": [
|
||||
{ "expr": "sum(rate(offlinekit_import_total[5m])) by (status)", "legendFormat": "{{status}}" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"title": "Offline Kit import success rate (%)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "percent", "decimals": 2 } },
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 * sum(rate(offlinekit_import_total{status=\"success\"}[5m])) / clamp_min(sum(rate(offlinekit_import_total[5m])), 1)"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"title": "Attestation verify latency p50/p95 (success)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "s", "decimals": 3 } },
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.50, sum(rate(offlinekit_attestation_verify_latency_seconds_bucket{success=\"true\"}[5m])) by (le, attestation_type))",
|
||||
"legendFormat": "p50 {{attestation_type}}"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.95, sum(rate(offlinekit_attestation_verify_latency_seconds_bucket{success=\"true\"}[5m])) by (le, attestation_type))",
|
||||
"legendFormat": "p95 {{attestation_type}}"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"title": "Rekor inclusion latency p50/p95 (by success)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "s", "decimals": 3 } },
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.50, sum(rate(rekor_inclusion_latency_bucket[5m])) by (le, success))",
|
||||
"legendFormat": "p50 success={{success}}"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.95, sum(rate(rekor_inclusion_latency_bucket[5m])) by (le, success))",
|
||||
"legendFormat": "p95 success={{success}}"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"title": "Rekor verification successes (rate)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "ops", "decimals": 3 } },
|
||||
"targets": [
|
||||
{ "expr": "sum(rate(attestor_rekor_success_total[5m])) by (mode)", "legendFormat": "{{mode}}" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"title": "Rekor verification retries (rate)",
|
||||
"datasource": "Prometheus",
|
||||
"fieldConfig": { "defaults": { "unit": "ops", "decimals": 3 } },
|
||||
"targets": [
|
||||
{ "expr": "sum(rate(attestor_rekor_retry_total[5m])) by (reason)", "legendFormat": "{{reason}}" }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,6 +1,6 @@
|
||||
# Logging Standards (DOCS-OBS-50-003)
|
||||
|
||||
Last updated: 2025-11-25 (Docs Tasks Md.VI)
|
||||
Last updated: 2025-12-15
|
||||
|
||||
## Goals
|
||||
- Deterministic, structured logs for all services.
|
||||
@@ -20,6 +20,14 @@ Required fields:
|
||||
Optional but recommended:
|
||||
- `resource` (subject id/purl/path when safe), `http.method`, `http.status_code`, `duration_ms`, `host`, `pid`, `thread`.
|
||||
|
||||
## Offline Kit / air-gap import fields
|
||||
When emitting logs for Offline Kit import/activation flows, keep field names stable:
|
||||
- Required scope key: `tenant_id`
|
||||
- Common keys: `bundle_type`, `bundle_digest`, `bundle_path`, `manifest_version`, `manifest_created_at`
|
||||
- Force activation keys: `force_activate`, `force_activate_reason`
|
||||
- Outcome keys: `result`, `reason_code`, `reason_message`
|
||||
- Quarantine keys: `quarantine_id`, `quarantine_path`
|
||||
|
||||
## Redaction rules
|
||||
- Never log Authorization headers, tokens, passwords, private keys, full request/response bodies.
|
||||
- Redact to `"[redacted]"` and add `redaction.reason` (`secret|pii|policy`).
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Metrics & SLOs (DOCS-OBS-51-001)
|
||||
|
||||
Last updated: 2025-11-25 (Docs Tasks Md.VI)
|
||||
Last updated: 2025-12-15
|
||||
|
||||
## Core metrics (platform-wide)
|
||||
- **Requests**: `http_requests_total{tenant,workload,route,status}` (counter); latency histogram `http_request_duration_seconds`.
|
||||
@@ -24,6 +24,77 @@ Last updated: 2025-11-25 (Docs Tasks Md.VI)
|
||||
- Queue backlog: `queue_depth > 1000` for 5m.
|
||||
- Job failures: `rate(worker_jobs_total{status="failed"}[10m]) > 0.01`.
|
||||
|
||||
## UX KPIs (triage TTFS)
|
||||
- Targets:
|
||||
- TTFS first evidence p95: <= 1.5s
|
||||
- TTFS skeleton p95: <= 0.2s
|
||||
- Clicks-to-closure median: <= 6
|
||||
- Evidence completeness avg: >= 90% (>= 3.6/4)
|
||||
|
||||
```promql
|
||||
# TTFS first evidence p50/p95
|
||||
histogram_quantile(0.50, sum(rate(stellaops_ttfs_first_evidence_seconds_bucket[5m])) by (le))
|
||||
histogram_quantile(0.95, sum(rate(stellaops_ttfs_first_evidence_seconds_bucket[5m])) by (le))
|
||||
|
||||
# Clicks-to-closure median
|
||||
histogram_quantile(0.50, sum(rate(stellaops_clicks_to_closure_bucket[5m])) by (le))
|
||||
|
||||
# Evidence completeness average percent (0-4 mapped to 0-100)
|
||||
100 * (sum(rate(stellaops_evidence_completeness_score_sum[5m])) / clamp_min(sum(rate(stellaops_evidence_completeness_score_count[5m])), 1)) / 4
|
||||
|
||||
# Budget violations by phase
|
||||
sum(rate(stellaops_performance_budget_violations_total[5m])) by (phase)
|
||||
```
|
||||
|
||||
- Dashboard: `ops/devops/observability/grafana/triage-ttfs.json`
|
||||
- Alerts: `ops/devops/observability/triage-alerts.yaml`
|
||||
|
||||
## TTFS Metrics (time-to-first-signal)
|
||||
- Core metrics:
|
||||
- `ttfs_latency_seconds{surface,cache_hit,signal_source,kind,phase,tenant_id}` (histogram)
|
||||
- `ttfs_signal_total{surface,cache_hit,signal_source,kind,phase,tenant_id}` (counter)
|
||||
- `ttfs_cache_hit_total{surface,cache_hit,signal_source,kind,phase,tenant_id}` (counter)
|
||||
- `ttfs_cache_miss_total{surface,cache_hit,signal_source,kind,phase,tenant_id}` (counter)
|
||||
- `ttfs_slo_breach_total{surface,cache_hit,signal_source,kind,phase,tenant_id}` (counter)
|
||||
- `ttfs_error_total{surface,cache_hit,signal_source,kind,phase,tenant_id,error_type,error_code}` (counter)
|
||||
|
||||
- SLO targets:
|
||||
- P50 < 2s, P95 < 5s (all surfaces)
|
||||
- Warm path P50 < 700ms, P95 < 2.5s
|
||||
- Cold path P95 < 4s
|
||||
|
||||
```promql
|
||||
# TTFS latency p50/p95
|
||||
histogram_quantile(0.50, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le))
|
||||
histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le))
|
||||
|
||||
# SLO breach rate (per minute)
|
||||
60 * sum(rate(ttfs_slo_breach_total[5m]))
|
||||
```
|
||||
|
||||
## Offline Kit (air-gap) metrics
|
||||
- `offlinekit_import_total{status,tenant_id}` (counter)
|
||||
- `offlinekit_attestation_verify_latency_seconds{attestation_type,success}` (histogram)
|
||||
- `attestor_rekor_success_total{mode}` (counter)
|
||||
- `attestor_rekor_retry_total{reason}` (counter)
|
||||
- `rekor_inclusion_latency{success}` (histogram)
|
||||
|
||||
```promql
|
||||
# Import rate by status
|
||||
sum(rate(offlinekit_import_total[5m])) by (status)
|
||||
|
||||
# Import success rate
|
||||
sum(rate(offlinekit_import_total{status="success"}[5m])) / clamp_min(sum(rate(offlinekit_import_total[5m])), 1)
|
||||
|
||||
# Attestation verify p95 by type (success only)
|
||||
histogram_quantile(0.95, sum(rate(offlinekit_attestation_verify_latency_seconds_bucket{success="true"}[5m])) by (le, attestation_type))
|
||||
|
||||
# Rekor inclusion latency p95 (by success)
|
||||
histogram_quantile(0.95, sum(rate(rekor_inclusion_latency_bucket[5m])) by (le, success))
|
||||
```
|
||||
|
||||
Dashboard: `docs/observability/dashboards/offline-kit-operations.json`
|
||||
|
||||
## Observability hygiene
|
||||
- Tag everything with `tenant`, `workload`, `env`, `region`, `version`.
|
||||
- Keep metric names stable; prefer adding labels over renaming.
|
||||
|
||||
Reference in New Issue
Block a user