Add tests for SBOM generation determinism across multiple formats

- Created `StellaOps.TestKit.Tests` project for unit tests related to determinism. - Implemented `DeterminismManifestTests` to validate deterministic output for canonical bytes and strings, file read/write operations, and error handling for invalid schema versions. - Added `SbomDeterminismTests` to ensure identical inputs produce consistent SBOMs across SPDX 3.0.1 and CycloneDX 1.6/1.7 formats, including parallel execution tests. - Updated project references in `StellaOps.Integration.Determinism` to include the new determinism testing library.
2025-12-23 18:56:12 +02:00
parent 7ac70ece71
commit bc4318ef97
88 changed files with 6974 additions and 1230 deletions
--- a/docs2/README.md
+++ b/docs2/README.md
@@ -36,6 +36,7 @@ How to navigate
 - orchestrator/api.md - Orchestrator API surface
 - orchestrator/cli.md - Orchestrator CLI commands
 - orchestrator/console.md - Orchestrator console views
+- orchestrator/runbook.md - Orchestrator operations runbook
 - operations/quickstart.md - First scan workflow
 - operations/install-deploy.md - Install and deployment guidance
 - operations/deployment-versioning.md - Versioning and promotion model
@@ -47,6 +48,12 @@ How to navigate
 - operations/runtime-readiness.md - Runtime readiness checks
 - operations/slo.md - Service SLO overview
 - operations/runbooks.md - Operational runbooks and incident response
+- operations/key-rotation.md - Signing key rotation runbook
+- operations/proof-verification.md - Proof verification runbook
+- operations/score-proofs.md - Score proofs and replay operations
+- operations/reachability.md - Reachability operations
+- operations/trust-lattice.md - Trust lattice operations
+- operations/unknowns-queue.md - Unknowns queue operations
 - operations/notifications.md - Notifications Studio operations
 - notifications/overview.md - Notifications overview
 - notifications/rules.md - Notification rules and routing
@@ -54,8 +61,11 @@ How to navigate
 - notifications/templates.md - Notification templates
 - notifications/digests.md - Notification digests
 - notifications/pack-approvals.md - Pack approval notifications
+- notifications/runbook.md - Notifications operations runbook
 - operations/router-rate-limiting.md - Gateway rate limiting
 - release/release-engineering.md - Release and CI/CD overview
+- release/promotion-attestations.md - Promotion-time attestation predicate
+- release/release-notes.md - Release notes index and templates
 - api/overview.md - API surface and conventions
 - api/auth-and-tokens.md - Authority, OpTok, DPoP and mTLS, PoE
 - policy/policy-system.md - Policy DSL, lifecycle, and governance
@@ -99,12 +109,16 @@ How to navigate
 - ui/branding.md - Tenant branding model
 - data-and-schemas.md - Storage, schemas, and determinism rules
 - data/persistence.md - Database model and migration notes
+- data/postgresql-operations.md - PostgreSQL operations guide
+- data/postgresql-patterns.md - RLS and partitioning patterns
 - data/events.md - Event envelopes and validation
 - sbom/overview.md - SBOM formats, mapping, and heuristics
 - governance/approvals.md - Approval routing and audit
 - governance/exceptions.md - Exception lifecycle and controls
 - security-and-governance.md - Security policy, hardening, governance, compliance
 - security/identity-tenancy-and-scopes.md - Authority scopes and tenancy rules
+- security/multi-tenancy.md - Tenant lifecycle and isolation model
+- security/row-level-security.md - Database RLS enforcement
 - security/crypto-and-trust.md - Crypto profiles and trust roots
 - security/crypto-compliance.md - Regional crypto profiles and licensing notes
 - security/quota-and-licensing.md - Offline quota and JWT licensing
@@ -114,8 +128,19 @@ How to navigate
 - security/audit-events.md - Authority audit event schema
 - security/revocation-bundles.md - Revocation bundle format and verification
 - security/risk-model.md - Risk scoring model and explainability
+- risk/overview.md - Risk scoring overview
+- risk/factors.md - Risk factor catalog
+- risk/formulas.md - Risk scoring formulas
+- risk/profiles.md - Risk profile schema and lifecycle
+- risk/explainability.md - Risk explainability payloads
+- risk/api.md - Risk API endpoints
 - security/forensics-and-evidence-locker.md - Evidence locker and forensic storage
+- security/evidence-locker-publishing.md - Evidence locker publishing process
+- security/timeline.md - Timeline event ledger and exports
 - provenance/inline-provenance.md - DSSE metadata and transparency links
+- provenance/attestation-workflow.md - Attestation workflow and verification
+- provenance/rekor-policy.md - Rekor submission budget policy
+- provenance/backfill.md - Provenance backfill procedure
 - signals/unknowns.md - Unknowns registry and signals model
 - signals/unknowns-ranking.md - Unknowns scoring and triage bands
 - signals/uncertainty.md - Uncertainty states and tiers
@@ -129,7 +154,18 @@ How to navigate
 - migration/overview.md - Migration paths and parity guidance
 - vex/consensus.md - VEX consensus overview
 - testing-and-quality.md - Test strategy and quality gates
+- testing/router-chaos.md - Router chaos testing scenarios
 - observability.md - Metrics, logs, tracing, telemetry stack
+- observability-standards.md - Telemetry envelope, scrubbing, sampling
+- observability-logging.md - Logging fields and redaction
+- observability-tracing.md - Trace propagation and span conventions
+- observability-metrics-slos.md - Core metrics and SLO guidance
+- observability-telemetry-controls.md - Propagation, sealed mode, incident mode
+- observability-aoc.md - AOC ingestion observability
+- observability-aggregation.md - Aggregation pipeline observability
+- observability-policy.md - Policy Engine observability
+- observability-ui-telemetry.md - Console telemetry metrics and alerts
+- observability-vuln-telemetry.md - Vulnerability explorer telemetry
 - developer/onboarding.md - Local dev setup and workflows
 - developer/plugin-sdk.md - Plugin SDK summary
 - developer/devportal.md - Developer portal publishing
--- a/docs2/data/events.md
+++ b/docs2/data/events.md
@@ -7,6 +7,11 @@ Envelope types
 - Orchestrator events: versioned envelopes with idempotency keys and trace context.
 - Legacy Redis envelopes: transitional schemas used for older consumers.

+Event catalog (examples)
+- scanner.event.report.ready@1 and scanner.event.scan.completed@1 (orchestrator envelopes).
+- scanner.report.ready@1 and scanner.scan.completed@1 (legacy Redis envelopes).
+- scheduler.rescan.delta@1, scheduler.graph.job.completed@1, attestor.logged@1.
+
 Orchestrator envelope fields (v1)
 - eventId, kind, version, tenant
 - occurredAt, recordedAt
@@ -26,6 +31,8 @@ Versioning rules
 Validation
 - Schemas and samples live under docs/events/ and docs/events/samples/.
 - Offline validation uses ajv-cli; keep schema checks deterministic.
+- Validate schemas with ajv compile and validate samples against matching schemas.
+- Add new samples for each new schema version.

 Related references
 - docs/events/README.md
--- a/docs2/data/persistence.md
+++ b/docs2/data/persistence.md
@@ -32,3 +32,5 @@ Migration notes
 Related references
 - ADR: docs/adr/0001-postgresql-for-control-plane.md
 - Module architecture: docs/modules/*/architecture.md
+- data/postgresql-operations.md
+- data/postgresql-patterns.md
--- a/docs2/data/postgresql-operations.md
+++ b/docs2/data/postgresql-operations.md
@@ -0,0 +1,36 @@
+# PostgreSQL operations
+
+Purpose
+- Operate the canonical PostgreSQL control plane with deterministic behavior.
+
+Schema topology
+- Per-module schemas: authority, vuln, vex, scheduler, notify, policy, concelier, audit.
+- Tenant isolation enforced via tenant_id and RLS policies.
+
+Performance setup
+- Enable pg_stat_statements for query analysis.
+- Tune shared_buffers, effective_cache_size, work_mem, and WAL sizes per host.
+- Use PgBouncer in transaction pooling mode for high concurrency.
+
+Session defaults
+- SET app.tenant_id per connection.
+- SET timezone to UTC.
+- Enforce statement_timeout for long-running queries.
+
+Query analysis
+- Use pg_stat_statements to find high total and high mean latency queries.
+- Use EXPLAIN ANALYZE with BUFFERS to detect missing indexes.
+
+Backups and restore
+- Use scheduled logical or physical backups with tested restore paths.
+- Keep PITR capability where required by retention policies.
+- Validate backups with deterministic restore tests.
+
+Monitoring
+- Track connection count, replication lag, and slow query rates.
+- Alert on pool saturation and replication delays.
+
+Related references
+- data/postgresql-patterns.md
+- data/persistence.md
+- docs/operations/postgresql-guide.md
--- a/docs2/data/postgresql-patterns.md
+++ b/docs2/data/postgresql-patterns.md
@@ -0,0 +1,33 @@
+# PostgreSQL patterns
+
+Row-level security (RLS)
+- Require tenant context via app.tenant_id session setting.
+- Policies filter by tenant_id on all tenant-scoped tables.
+- Admin operations use explicit bypass roles and audited access.
+
+Validating RLS
+- Run staging tests that attempt cross-tenant reads and writes.
+- Use deterministic replay tests for RLS regressions.
+
+Bitemporal unknowns
+- Store current and historical states with valid_from and valid_to.
+- Support point-in-time queries and deterministic ordering.
+
+Time-based partitioning
+- Partition high-volume tables by time.
+- Pre-create future partitions and archive old partitions.
+- Use deterministic maintenance checklists for partition health.
+
+Generated columns
+- Use generated columns for derived flags and query optimization.
+- Add columns via migrations and backfill deterministically.
+
+Troubleshooting
+- RLS failures: verify tenant context and policy attachment.
+- Partition issues: check missing partitions and default tables.
+- Bitemporal queries: confirm valid time windows and index usage.
+
+Related references
+- data/postgresql-operations.md
+- security/multi-tenancy.md
+- docs/operations/postgresql-patterns-runbook.md
--- a/docs2/notifications/overview.md
+++ b/docs2/notifications/overview.md
@@ -22,3 +22,4 @@ Related references
 - docs/notifications/overview.md
 - docs/notifications/architecture.md
 - docs2/operations/notifications.md
+- notifications/runbook.md
--- a/docs2/notifications/runbook.md
+++ b/docs2/notifications/runbook.md
@@ -0,0 +1,40 @@
+# Notifications runbook
+
+Purpose
+- Deploy and operate the Notifications WebService and Worker.
+
+Pre-flight
+- Secrets stored in Authority (SMTP, Slack, webhook HMAC).
+- Outbound allowlist configured for channels.
+- PostgreSQL and Valkey reachable; health checks pass.
+- Offline kit loaded with templates and rule seeds.
+
+Deploy
+- Deploy images with digests pinned.
+- Set Notify Postgres, Redis, Authority, and allowlist settings.
+- Warm caches via /api/v1/notify/admin/warm when needed.
+
+Monitor
+- notify_delivery_attempts_total by status and channel.
+- notify_escalation_stage_total and notify_rule_eval_seconds.
+- Logs include tenant, ruleId, deliveryId, channel, status.
+
+Common operations
+- List failed deliveries and replay.
+- Pause a tenant without dropping audit events.
+- Rotate channel secrets via refresh endpoints.
+
+Failure recovery
+- Validate templates and Redis connectivity for worker crashes.
+- Replay deliveries after database recovery.
+- Disable channels during upstream outages.
+
+Determinism safeguards
+- Rule snapshots versioned per tenant.
+- Template rendering uses deterministic helpers.
+- UTC time sources for quiet hours.
+
+Related references
+- notifications/overview.md
+- notifications/rules.md
+- docs/operations/notifier-runbook.md
--- a/docs2/observability-aggregation.md
+++ b/docs2/observability-aggregation.md
@@ -0,0 +1,34 @@
+# Aggregation observability
+
+Purpose
+- Track Link-Not-Merge aggregation and overlay pipelines.
+
+Metrics
+- aggregation_ingest_latency_seconds{tenant,source,status}
+- aggregation_conflict_total{tenant,advisory,product,reason}
+- aggregation_overlay_cache_hits_total, aggregation_overlay_cache_misses_total
+- aggregation_vex_gate_total{tenant,status}
+- aggregation_queue_depth{tenant}
+
+Traces
+- Span: aggregation.process
+- Attributes: tenant, advisory, product, vex_status, source_kind, overlay_version, cache_hit
+
+Logs
+- tenant, advisory, product, vex_status
+- decision (merged, suppressed, dropped)
+- reason, duration_ms, trace_id
+
+SLOs
+- Ingest latency p95 < 500ms per statement.
+- Overlay cache hit rate > 80%.
+- Error rate < 0.1% over 10 minutes.
+
+Alerts
+- HighConflictRate: aggregation_conflict_total delta > 100 per minute.
+- QueueBacklog: aggregation_queue_depth > 10k for 5 minutes.
+- LowCacheHit: cache hit rate < 60% for 10 minutes.
+
+Offline posture
+- Export metrics to local Prometheus scrape.
+- Deterministic ordering preserved; cache warmers seeded from bundled fixtures.
--- a/docs2/observability-aoc.md
+++ b/docs2/observability-aoc.md
@@ -0,0 +1,49 @@
+# AOC observability
+
+Purpose
+- Monitor Aggregation-Only ingestion for Concelier and Excititor.
+- Provide deterministic metrics, traces, and logs for AOC guardrails.
+
+Core metrics
+- ingestion_write_total{source,tenant,result}
+- ingestion_latency_seconds{source,tenant,phase}
+- aoc_violation_total{source,tenant,code}
+- ingestion_signature_verified_total{source,tenant,result}
+- advisory_revision_count{source,tenant}
+- verify_runs_total{tenant,initiator}
+- verify_duration_seconds{tenant,initiator}
+
+Alert guidance
+- Violation spike: increase(aoc_violation_total[15m]) > 0 for critical sources.
+- Stale ingestion: no growth in ingestion_write_total for > 60 minutes.
+- Signature drop: rising ingestion_signature_verified_total{result="fail"}.
+
+Health snapshot endpoint
+- GET /obs/excititor/health returns ingest, link, signature, conflict status.
+- Settings control warning and critical thresholds for lag, coverage, and conflict ratio.
+
+Trace taxonomy
+- ingest.fetch, ingest.transform, ingest.write
+- aoc.guard for violations
+- verify.run for verification jobs
+
+Log fields
+- traceId, tenant, source.vendor, upstream.upstreamId
+- contentHash, violation.code, verification.window
+- Correlation headers: X-Stella-TraceId, X-Stella-CorrelationId
+
+Advisory AI chunk metrics
+- advisory_ai_chunk_requests_total
+- advisory_ai_chunk_latency_milliseconds
+- advisory_ai_chunk_segments
+- advisory_ai_chunk_sources
+- advisory_ai_guardrail_blocks_total
+
+Dashboards
+- AOC ingestion health: sources overview, violations, signature rate, supersedes depth.
+- Offline mode dashboard from offline snapshots.
+
+Offline posture
+- Metrics exporters write to local Prometheus snapshots in offline kits.
+- CLI verification reports are hashed and archived.
+- Dashboards support offline data sources.
--- a/docs2/observability-logging.md
+++ b/docs2/observability-logging.md
@@ -0,0 +1,39 @@
+# Logging standards
+
+Goals
+- Deterministic, structured logs for all services.
+- Safe for tenant isolation and offline review.
+
+Required fields
+- timestamp (UTC ISO-8601)
+- tenant, workload, env, region, version
+- level (debug, info, warn, error, fatal)
+- category and operation
+- trace_id, span_id, correlation_id when present
+- message (concise, no secrets)
+- status (ok, error, fault, throttle)
+- error.code, error.message (redacted), retryable when status is not ok
+
+Optional fields
+- resource, http.method, http.status_code, duration_ms
+- host, pid, thread
+
+Offline kit import fields
+- tenant_id, bundle_type, bundle_digest, bundle_path
+- manifest_version, manifest_created_at
+- force_activate, force_activate_reason
+- result, reason_code, reason_message
+- quarantine_id, quarantine_path
+
+Redaction rules
+- Never log auth headers, tokens, passwords, private keys, or full bodies.
+- Redact to "[redacted]" and add redaction.reason.
+- Hash low-cardinality identifiers and mark hashed=true.
+
+Determinism and offline posture
+- NDJSON with LF endings; UTC timestamps only.
+- No external enrichment; rely on bundled metadata.
+
+Sampling and rate limits
+- Info logs rate-limited per component; warn and error never sampled.
+- Audit logs are never sampled and include actor, action, target, result.
--- a/docs2/observability-metrics-slos.md
+++ b/docs2/observability-metrics-slos.md
@@ -0,0 +1,57 @@
+# Metrics and SLOs
+
+Core metrics (platform-wide)
+- http_requests_total{tenant,workload,route,status}
+- http_request_duration_seconds (histogram)
+- worker_jobs_total{tenant,queue,status}
+- worker_job_duration_seconds (histogram)
+- db_query_duration_seconds{db,operation}
+- db_pool_in_use, db_pool_available
+- cache_requests_total{result=hit|miss}
+- cache_latency_seconds (histogram)
+- queue_depth{tenant,queue}
+- errors_total{tenant,workload,code}
+
+SLO targets (suggested)
+- API availability: 99.9% monthly per public service.
+- P95 latency: <300ms reads, <1s writes.
+- Worker job success: >99% over 30d.
+- Queue backlog: alert when queue_depth > 1000 for 5 minutes.
+
+Alert examples
+- Error rate: rate(errors_total[5m]) / rate(http_requests_total[5m]) > 0.02
+- Latency regression: p95 http_request_duration_seconds > 0.3s
+- Queue backlog: queue_depth > 1000 for 5 minutes
+- Job failures: rate(worker_jobs_total{status="failed"}[10m]) > 0.01
+
+UX KPIs (triage TTFS)
+- P95 first evidence <= 1.5s; skeleton <= 0.2s.
+- Clicks-to-closure median <= 6.
+- Evidence completeness >= 90% (>= 3.6/4).
+
+TTFS metrics
+- ttfs_latency_seconds{surface,cache_hit,signal_source,kind,phase,tenant_id}
+- ttfs_signal_total{surface,cache_hit,signal_source,kind,phase,tenant_id}
+- ttfs_cache_hit_total, ttfs_cache_miss_total
+- ttfs_slo_breach_total{surface,cache_hit,signal_source,kind,phase,tenant_id}
+- ttfs_error_total{surface,cache_hit,signal_source,kind,phase,tenant_id,error_type,error_code}
+
+Offline kit metrics
+- offlinekit_import_total{status,tenant_id}
+- offlinekit_attestation_verify_latency_seconds{attestation_type,success}
+- attestor_rekor_success_total{mode}
+- attestor_rekor_retry_total{reason}
+- rekor_inclusion_latency{success}
+
+Scanner FN-Drift metrics
+- scanner.fn_drift.percent (30-day rolling percentage)
+- scanner.fn_drift.transitions_30d and scanner.fn_drift.evaluated_30d
+- scanner.fn_drift.cause.feed_delta, rule_delta, lattice_delta, reachability_delta, engine
+- scanner.classification_changes_total{cause}
+- scanner.fn_transitions_total{cause}
+- SLO targets: warning above 1.0%, critical above 2.5%, engine drift > 0%
+
+Hygiene
+- Tag metrics with tenant, workload, env, region, version.
+- Keep metric names stable and namespace custom metrics per module.
+- Use deterministic bucket boundaries and consistent units.
--- a/docs2/observability-policy.md
+++ b/docs2/observability-policy.md
@@ -0,0 +1,48 @@
+# Policy observability
+
+Purpose
+- Capture Policy Engine metrics, logs, traces, and incident workflows.
+
+Metrics
+- policy_run_seconds{tenant,policy,mode}
+- policy_run_queue_depth{tenant}
+- policy_run_failures_total{tenant,policy,reason}
+- policy_run_retries_total{tenant,policy}
+- policy_run_inputs_pending_bytes{tenant}
+- policy_rules_fired_total{tenant,policy,rule}
+- policy_vex_overrides_total{tenant,policy,vendor,justification}
+- policy_suppressions_total{tenant,policy,action}
+- policy_selection_batch_duration_seconds{tenant,policy}
+- policy_materialization_conflicts_total{tenant,policy}
+- policy_api_requests_total{endpoint,method,status}
+- policy_api_latency_seconds{endpoint,method}
+- policy_api_rate_limited_total{endpoint}
+- policy_queue_leases_active{tenant}
+- policy_queue_lease_expirations_total{tenant}
+- policy_delta_backlog_age_seconds{tenant,source}
+
+Logs
+- Structured JSON with policyId, policyVersion, tenant, runId, rule, traceId, env.sealed.
+- Categories: policy.run, policy.evaluate, policy.materialize, policy.simulate, policy.lifecycle.
+- Rule-hit logs sample at 1% by default; incident mode raises to 100%.
+
+Traces
+- policy.api, policy.select, policy.evaluate, policy.materialize, policy.simulate.
+- Trace context propagated to CLI and UI.
+
+Alerts
+- PolicyRunSlaBreach: p95 policy_run_seconds too high.
+- PolicyQueueStuck: policy_delta_backlog_age_seconds > 600.
+- DeterminismMismatch: ERR_POL_004 or replay diff.
+- SimulationDrift: simulation exit 20 over threshold.
+- VexOverrideSpike and SuppressionSurge.
+
+Incident mode
+- POST /api/policy/incidents/activate toggles sampling to 100%.
+- Retention extends to 30 days during incident.
+- policy.incident.activated event emitted.
+
+Integration points
+- Authority metrics for scope_denied events.
+- Concelier and Excititor trace propagation via gRPC metadata.
+- Offline kits export metrics and logs snapshots.
--- a/docs2/observability-standards.md
+++ b/docs2/observability-standards.md
@@ -0,0 +1,29 @@
+# Observability standards
+
+Common envelope fields
+- Trace context: trace_id, span_id, trace_flags; propagate W3C traceparent and baggage.
+- Tenant and workload: tenant, workload (service), region, env, version.
+- Subject: component, operation, resource (purl or uri when safe).
+- Timing: UTC ISO-8601 timestamp; durations in milliseconds.
+- Outcome: status (ok, error, fault, throttle), error.code, redacted error.message, retryable.
+
+Scrubbing policy
+- Denylist PII and secrets: emails, tokens, auth headers, private keys, passwords.
+- Redact to "[redacted]" and add redaction.reason (secret, pii, tenant_policy).
+- Hash low-cardinality identifiers with sha256 and mark hashed=true.
+- Never log full request or response bodies; store hashes and lengths only.
+
+Sampling defaults
+- Traces: 10% non-prod, 5% prod; always sample error or audit spans.
+- Logs: info logs rate-limited; warn and error never sampled.
+- Metrics: never sampled; stable histogram buckets per component.
+
+Redaction override
+- Overrides require a ticket id and are time-bound.
+- Config: telemetry.redaction.overrides and telemetry.redaction.override_ttl (default 24h).
+- Emit telemetry.redaction.audit with actor, fields, and TTL.
+
+Determinism and offline
+- No external enrichers; use bundled service maps and tenant metadata only.
+- Export ordering: timestamp, workload, operation.
+- Always use UTC; NDJSON for log exports.
--- a/docs2/observability-telemetry-controls.md
+++ b/docs2/observability-telemetry-controls.md
@@ -0,0 +1,61 @@
+# Telemetry controls and propagation
+
+Bootstrap wiring
+- AddStellaOpsTelemetry wires metrics and tracing with deterministic defaults.
+- Disable exporters when sealed or when egress is not allowed.
+
+Minimal host wiring (example)
+```csharp
+builder.Services.AddStellaOpsTelemetry(
+    builder.Configuration,
+    serviceName: "StellaOps.SampleService",
+    serviceVersion: builder.Configuration["VERSION"],
+    configureOptions: options =>
+    {
+        options.Collector.Enabled = builder.Configuration.GetValue<bool>("Telemetry:Collector:Enabled", true);
+        options.Collector.Endpoint = builder.Configuration["Telemetry:Collector:Endpoint"];
+        options.Collector.Protocol = TelemetryCollectorProtocol.Grpc;
+    },
+    configureMetrics: m => m.AddAspNetCoreInstrumentation(),
+    configureTracing: t => t.AddHttpClientInstrumentation());
+```
+
+Propagation rules
+- HTTP headers: traceparent, tracestate, x-stella-tenant, x-stella-actor, x-stella-imposed-rule.
+- gRPC metadata: stella-tenant, stella-actor, stella-imposed-rule.
+- Tenant is required for all requests except sealed diagnostics jobs.
+
+Metrics helper expectations
+- Golden signals: http.server.duration, http.client.duration, messaging.operation.duration,
+  job.execution.duration, runtime.gc.pause, db.call.duration.
+- Mandatory tags: tenant, service, endpoint or operation, result (ok|error|cancelled|throttled), sealed.
+- Cardinality guard trims tag values to 64 chars and caps distinct values per key.
+
+Scrubbing configuration
+- Telemetry:Scrub:Enabled (default true)
+- Telemetry:Scrub:Sealed (forces scrubbing when sealed)
+- Telemetry:Scrub:HashSalt (optional)
+- Telemetry:Scrub:MaxValueLength (default 256)
+
+Sealed mode behavior
+- Disable external exporters; use memory or file OTLP.
+- Tag sealed=true and scrubbed=true on all records.
+- Sampling capped by Telemetry:Sealed:MaxSamplingPercent.
+- File exporter rotates deterministically and enforces 0600 permissions.
+
+Sealed mode config keys
+- Telemetry:Sealed:Enabled
+- Telemetry:Sealed:Exporter (memory|file)
+- Telemetry:Sealed:FilePath
+- Telemetry:Sealed:MaxBytes
+- Telemetry:Sealed:MaxSamplingPercent
+
+Incident mode (CLI)
+- Flag: --incident-mode
+- Config: Telemetry:Incident:Enabled and Telemetry:Incident:TTL
+- State file: ~/.stellaops/incident-mode.json (0600 permissions)
+- Emits telemetry.incident.activated and telemetry.incident.expired audit events.
+
+Determinism
+- UTC timestamps and stable ordering for OTLP exports.
+- No external enrichment in sealed mode.
--- a/docs2/observability-tracing.md
+++ b/docs2/observability-tracing.md
@@ -0,0 +1,27 @@
+# Tracing standards
+
+Goals
+- Consistent distributed tracing across services, workers, and CLI.
+- Safe for offline and air-gapped deployments.
+
+Context propagation
+- Use W3C traceparent and baggage only.
+- Preserve incoming trace_id and create child spans per operation.
+- For async work, attach stored trace context as links rather than a new parent.
+
+Span conventions
+- Names use <component>.<operation> (example: policy.evaluate).
+- Required attributes: tenant, workload, env, region, version, operation, status.
+- HTTP spans: http.method, http.route, http.status_code, net.peer.name, net.peer.port.
+- DB spans: db.system, db.name, db.operation, db.statement (no literals).
+- Message spans: messaging.system, messaging.destination, messaging.operation, messaging.message_id.
+- Errors: status=error with error.code, redacted error.message, retryable.
+
+Sampling
+- Default head sampling: 10% non-prod, 5% prod.
+- Always sample error or audit spans.
+- Override via Tracing__SampleRate per service.
+
+Offline posture
+- No external exporters; emit OTLP to local collector or file.
+- UTC timestamps only.
--- a/docs2/observability-ui-telemetry.md
+++ b/docs2/observability-ui-telemetry.md
@@ -0,0 +1,45 @@
+# Console telemetry
+
+Purpose
+- Capture console performance, security signals, and offline behavior.
+
+Metrics
+- ui_route_render_seconds{route,tenant,device}
+- ui_request_duration_seconds{service,method,status,tenant}
+- ui_filter_apply_total{route,filter,tenant}
+- ui_tenant_switch_total{fromTenant,toTenant,trigger}
+- ui_offline_banner_seconds{reason,tenant}
+- ui_dpop_failure_total{endpoint,reason}
+- ui_fresh_auth_prompt_total{action,tenant}
+- ui_fresh_auth_failure_total{action,reason}
+- ui_download_manifest_refresh_seconds{tenant,channel}
+- ui_download_export_queue_depth{tenant,artifactType}
+- ui_download_command_copied_total{tenant,artifactType}
+- ui_telemetry_batch_failures_total{transport,reason}
+- ui_telemetry_queue_depth{priority,tenant}
+
+Logs
+- Categories: ui.action, ui.tenant.switch, ui.download.commandCopied, ui.security.anomaly, ui.telemetry.failure.
+- Core fields: timestamp, level, action, route, tenant, subject, correlationId, offlineMode.
+- PII is scrubbed; user identifiers are hashed.
+
+Traces
+- ui.route.transition, ui.api.fetch, ui.sse.stream, ui.telemetry.batch, ui.policy.action.
+- W3C traceparent propagated through the gateway for cross-service stitching.
+
+Feature flags and config
+- CONSOLE_METRICS_ENABLED, CONSOLE_METRICS_VERBOSE, CONSOLE_LOG_LEVEL.
+- OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS.
+- CONSOLE_TELEMETRY_SSE_ENABLED to expose /console/telemetry.
+
+Offline workflow
+- Metrics scraped locally and stored with offline bundles.
+- OTLP batches queue locally and expose ui_telemetry_queue_depth.
+- Retain telemetry bundles for audit; export Grafana JSON with bundles.
+
+Alerting hints
+- ConsoleLatencyHigh when ui_route_render_seconds p95 exceeds target.
+- BackendLatencyHigh when ui_request_duration_seconds spikes.
+- TenantSwitchFailures when ui_dpop_failure_total increases.
+- DownloadsBacklog when ui_download_export_queue_depth grows.
+- TelemetryExportErrors when ui_telemetry_batch_failures_total > 0.
--- a/docs2/observability-vuln-telemetry.md
+++ b/docs2/observability-vuln-telemetry.md
@@ -0,0 +1,22 @@
+# Vuln explorer telemetry
+
+Purpose
+- Define metrics, logs, traces, and dashboards for vulnerability triage.
+
+Planned metrics (pending final identifiers)
+- findings_open_total
+- mttr_seconds
+- triage_actions_total
+- report_generation_seconds
+
+Planned logs
+- Fields: findingId, artifactId, advisoryId, policyVersion, actor, actionType.
+- Deterministic JSON with correlation IDs.
+
+Planned traces
+- Spans for triage actions and report generation.
+- Sampling follows global tracing defaults; errors always sampled.
+
+Assets and hashes
+- Capture metrics, logs, traces, and dashboard exports with SHA256SUMS.
+- Store assets under docs/assets/vuln-explorer/ once available.
--- a/docs2/observability.md
+++ b/docs2/observability.md
@@ -1,14 +1,23 @@
 # Observability

-## Telemetry signals
- Metrics for scan latency, cache hit rate, policy evaluation time, queue depth.
- Logs are structured and include correlation IDs.
- Traces connect Scanner, Policy, Scheduler, and Notify workflows.
+Overview
+- Deterministic metrics, logs, and traces with tenant isolation.
+- Offline-friendly exports for audits and air-gap review.

-## Audit trails
- Signing and policy actions are recorded for compliance.
- Tenant and actor metadata is included in audit records.
+Core references
+- observability-standards.md
+- observability-logging.md
+- observability-tracing.md
+- observability-metrics-slos.md
+- observability-telemetry-controls.md

-## Telemetry stack
- Telemetry module provides collectors, dashboards, and alert rules.
- Offline bundles include telemetry assets for air-gapped installs.
+Service and workflow observability
+- observability-aoc.md
+- observability-aggregation.md
+- observability-policy.md
+- observability-ui-telemetry.md
+- observability-vuln-telemetry.md
+
+Audit alignment
+- security/forensics-and-evidence-locker.md
+- security/timeline.md
--- a/docs2/operations/airgap-runbooks.md
+++ b/docs2/operations/airgap-runbooks.md
@@ -6,6 +6,30 @@ Core runbooks
 - Quarantine: isolate bundles with hash or signature mismatches.
 - Sealed startup diagnostics: confirm egress block and time anchor validity.

+Offline kit management
+- Generate full or delta kits in connected environments.
+- Verify kit hash and signature before transfer.
+- Import and install kit, then confirm component freshness.
+
+Feed updates
+- Use delta kits for smaller updates.
+- Roll back to previous snapshot when feeds introduce regressions.
+- Track feed age and kit expiry thresholds.
+
+Scanning in air-gap mode
+- Scan local images or SBOMs without registry pull.
+- Generate SBOMs locally and scan from file.
+- Force offline feeds when required by policy.
+
+Verification in air-gap mode
+- Verify proof bundles offline with local trust roots.
+- Export and import trust bundles for signer and CA rotation.
+- Run score replay with frozen timestamps if needed.
+
+Health checks
+- Monitor kit age, feed freshness, trust store validity, disk usage.
+- Use deterministic health checks and keep results for audit.
+
 Import and verify
 - Validate bundle hash, manifest entries, and schema checks.
 - Record import receipt with operator, time anchor, and manifest hash.
--- a/docs2/operations/key-rotation.md
+++ b/docs2/operations/key-rotation.md
@@ -0,0 +1,49 @@
+# Key rotation
+
+Purpose
+- Rotate signing keys without invalidating historical DSSE proofs.
+
+Principles
+- Do not mutate old DSSE envelopes.
+- Keep key history; revoke instead of delete.
+- Publish key material to trust anchors and mirrors.
+- Audit all key lifecycle events.
+
+Key profiles (examples)
+- default: SHA256-ED25519
+- fips: SHA256-ECDSA-P256
+- gost: GOST-R-34.10-2012
+- sm2: SM2-P256
+- pqc: ML-DSA-65
+
+Rotation workflow
+1. Generate a new key in the configured keystore.
+2. Add the key to the trust anchor without removing old keys.
+3. Run a transition period where both keys verify.
+4. Revoke the old key with an effective date.
+5. Publish updated key material to attestation feeds or mirrors.
+
+Trust anchors
+- Scoped by PURL pattern and allowed predicate types.
+- Store allowedKeyIds, revokedKeys, and keyHistory with timestamps.
+
+Verification with key history
+- Verify signatures using the key valid at the time of signing.
+- Revoked keys remain valid for pre-revocation attestations.
+
+Emergency revocation
+- Revoke compromised keys immediately and publish updated anchors.
+- Re-issue trust bundles and notify downstream verifiers.
+
+Metrics and alerts
+- signer_key_age_days
+- signer_keys_active_total
+- signer_keys_revoked_total
+- signer_rotation_events_total
+- signer_verification_key_lookups_total
+- Alerts when keys near or exceed maximum age.
+
+Related references
+- security/crypto-and-trust.md
+- provenance/attestation-workflow.md
+- docs/operations/key-rotation-runbook.md
--- a/docs2/operations/proof-verification.md
+++ b/docs2/operations/proof-verification.md
@@ -0,0 +1,37 @@
+# Proof verification
+
+Purpose
+- Verify DSSE bundles and transparency proofs for scan and score evidence.
+
+Components
+- DSSE envelope and signature bundle.
+- Certificate chain and trust roots.
+- Rekor inclusion proof and checkpoint when online.
+
+Basic verification
+- Verify DSSE signature against trusted roots.
+- Confirm subject digest matches expected artifact.
+- Validate Merkle inclusion proof when available.
+
+Offline verification
+- Use embedded proofs and local trust bundles.
+- Skip online Rekor queries in sealed mode.
+- Record verification results in timeline events.
+
+Transparency log integration
+- Check Rekor entry status and inclusion proof.
+- When Rekor is unavailable, rely on cached checkpoint and proofs.
+
+Troubleshooting cues
+- DSSE signature invalid: check key rotation or trust anchors.
+- Merkle root mismatch: verify checkpoint and bundle integrity.
+- Certificate chain failure: refresh trust roots.
+
+Monitoring
+- Track verification latency and failure counts.
+- Alert on certificate expiry or rising verification failures.
+
+Related references
+- provenance/attestation-workflow.md
+- release/promotion-attestations.md
+- docs/operations/proof-verification-runbook.md
--- a/docs2/operations/reachability.md
+++ b/docs2/operations/reachability.md
@@ -0,0 +1,36 @@
+# Reachability operations
+
+Purpose
+- Operate call graph ingestion, reachability computation, and explain queries.
+
+Reachability statuses
+- unreachable, possibly_reachable, reachable_static, reachable_proven, unknown.
+
+Call graph operations
+- Upload call graphs and validate schema.
+- Inspect entrypoints and merge graphs when required.
+- Enforce size limits and deterministic ordering.
+
+Computation
+- Trigger reachability computation per scan or batch.
+- Monitor jobs for timeouts and memory caps.
+- Persist results with graph_cache_epoch for replay.
+
+Explain queries
+- Explain a single finding or batch.
+- Provide alternate paths and reasons for unreachable results.
+
+Drift handling
+- Track changes due to graph updates or reachability algorithm changes.
+- Use drift reports to compare runs and highlight path changes.
+
+Monitoring
+- Track computation latency, queue depth, and explain request rates.
+- Alert on repeated timeouts or inconsistent results.
+
+Related references
+- architecture/reachability-lattice.md
+- architecture/reachability-evidence.md
+- operations/score-proofs.md
+- docs/operations/reachability-runbook.md
+- docs/operations/reachability-drift-guide.md
--- a/docs2/operations/runbooks.md
+++ b/docs2/operations/runbooks.md
@@ -12,6 +12,12 @@ Runbook set (current)
 - docs/runbooks/replay_ops.md
 - docs/runbooks/vex-ops.md
 - docs/runbooks/vuln-ops.md
+- operations/score-proofs.md
+- operations/proof-verification.md
+- operations/reachability.md
+- operations/trust-lattice.md
+- operations/unknowns-queue.md
+- operations/key-rotation.md

 Common expectations
 - Hash and store any inbound artifacts with SHA256SUMS.
--- a/docs2/operations/score-proofs.md
+++ b/docs2/operations/score-proofs.md
@@ -0,0 +1,46 @@
+# Score proofs and replay
+
+Purpose
+- Provide deterministic score proofs with replayable inputs and attestations.
+
+When to replay
+- Determinism audits and compliance checks.
+- Dispute resolution or vendor verification.
+- Regression investigation after feed or policy changes.
+
+Replay operations
+- Trigger replay via CLI or API with scan or job id.
+- Support batch replay with concurrency limits.
+- Nightly replay jobs validate determinism at scale.
+
+Verification
+- Online verification uses DSSE and Rekor proofs.
+- Offline verification uses embedded proofs and local trust bundles.
+- Verification checks include bundle hash, signature, and input digests.
+
+Bundle contents
+- Manifest with inputs and hashes.
+- SBOM, advisories, VEX snapshots.
+- Deterministic scoring outputs and explain traces.
+- DSSE bundle and transparency proof.
+
+Retention and export
+- Retain bundles per policy; export for audit with manifests.
+- Store in Evidence Locker and Offline Kits.
+
+Monitoring metrics
+- score_replay_duration_seconds
+- proof_verification_success_rate
+- proof_bundle_size_bytes
+- replay_queue_depth
+- proof_generation_failures
+
+Alerting cues
+- Replay latency p95 > 30s.
+- Verification failures or queue backlog spikes.
+
+Related references
+- operations/proof-verification.md
+- operations/replay-and-determinism.md
+- docs/operations/score-proofs-runbook.md
+- docs/operations/score-replay-runbook.md
--- a/docs2/operations/trust-lattice.md
+++ b/docs2/operations/trust-lattice.md
@@ -0,0 +1,33 @@
+# Trust lattice operations
+
+Purpose
+- Monitor and operate trust lattice gates for VEX and policy decisions.
+
+Core components
+- Trust vectors and gate configuration.
+- Verdict replay for deterministic validation.
+
+Monitoring
+- Track gate failure rate, verdict replay failures, and trust vector drift.
+- Use dashboards for gate health and override usage.
+
+Common operations
+- View current trust vectors and gate configuration.
+- Inspect a verdict and its trust inputs.
+- Trigger manual calibration when required.
+
+Emergency procedures
+- High gate failure rate: pause dependent workflows and investigate sources.
+- Verdict replay failures: verify inputs, cache epochs, and policy versions.
+- Trust vector drift: run replay with frozen inputs and compare hashes.
+
+Maintenance
+- Daily checks: gate failure rate and queue depth.
+- Weekly checks: trust vector calibration and drift review.
+- Monthly checks: update trust bundles and audit logs.
+
+Related references
+- architecture/reachability-vex.md
+- vex/consensus.md
+- docs/operations/trust-lattice-runbook.md
+- docs/operations/trust-lattice-troubleshooting.md
--- a/docs2/operations/unknowns-queue.md
+++ b/docs2/operations/unknowns-queue.md
@@ -0,0 +1,32 @@
+# Unknowns queue operations
+
+Purpose
+- Manage unknown components with deterministic triage and SLA tracking.
+
+Queue model
+- Bands: HOT, WARM, COLD based on score and SLA.
+- Reasons include reachability gaps, provenance gaps, VEX conflicts, and ingestion gaps.
+
+Core workflows
+- List and triage unknowns by band and reason.
+- Escalate or resolve with documented justification.
+- Suppress with expiry and audit trail when approved.
+
+Budgets and SLAs
+- Per-environment budgets cap unknowns by reason.
+- SLA timers trigger alerts when breached.
+
+Monitoring
+- unknowns_total, unknowns_hot_count, unknowns_sla_breached
+- unknowns_escalation_failures, unknowns_avg_age_hours
+- KEV-specific unknown counts and age
+
+Alerting cues
+- HOT band spikes or SLA breaches.
+- KEV unknowns older than 24 hours.
+- Rising queue growth rate.
+
+Related references
+- signals/unknowns.md
+- signals/unknowns-ranking.md
+- docs/operations/unknowns-queue-runbook.md
--- a/docs2/orchestrator/overview.md
+++ b/docs2/orchestrator/overview.md
@@ -39,3 +39,4 @@ Related references
 - orchestrator/cli.md
 - orchestrator/console.md
 - orchestrator/run-ledger.md
+- orchestrator/runbook.md
--- a/docs2/orchestrator/runbook.md
+++ b/docs2/orchestrator/runbook.md
@@ -0,0 +1,36 @@
+# Orchestrator runbook
+
+Pre-flight
+- Verify database and queue backends are healthy.
+- Confirm tenant allowlist and orchestrator scopes in Authority.
+- Ensure plugin bundles are present and signatures verified.
+
+Common operations
+- Start a run via API or CLI.
+- Cancel runs with idempotent requests.
+- Stream status via WebSocket or CLI.
+- Export run ledger as NDJSON for audit.
+
+Incident response
+- Queue backlog: scale workers and drain oldest first.
+- Repeated failures: inspect error codes and inputsHash; roll back DAG version.
+- Plugin auth errors: rotate secrets and warm caches.
+
+Health checks
+- /admin/health for liveness and queue depth.
+- Metrics: orchestrator_runs_total, orchestrator_queue_depth,
+  orchestrator_step_retries_total, orchestrator_run_duration_seconds.
+- Logs include tenant, dagId, runId, status with redaction.
+
+Determinism and immutability
+- Runs are append-only; never mutate ledger entries.
+- Use runToken for idempotent retries.
+
+Offline posture
+- Keep DAG specs and plugins in sealed storage.
+- Export logs, metrics, and traces as NDJSON.
+
+Related references
+- orchestrator/overview.md
+- orchestrator/architecture.md
+- docs/operations/orchestrator-runbook.md
--- a/docs2/provenance/attestation-workflow.md
+++ b/docs2/provenance/attestation-workflow.md
@@ -0,0 +1,46 @@
+# Attestation workflow
+
+Purpose
+- Ensure all exported evidence includes DSSE signatures and transparency proofs.
+- Provide deterministic verification for online and air-gapped environments.
+
+Workflow overview
+- Producer emits a payload and requests signing.
+- Signer validates policy and signs with tenant or keyless credentials.
+- Attestor wraps the payload in DSSE, records transparency data, and publishes bundles.
+- Export Center and Evidence Locker embed bundles in export artifacts.
+- Verifiers (CLI, services, auditors) validate signatures and proofs.
+
+Payload types
+- StellaOps.BuildProvenance@1
+- StellaOps.SBOMAttestation@1
+- StellaOps.ScanResults@1
+- StellaOps.PolicyEvaluation@1
+- StellaOps.VEXAttestation@1
+- StellaOps.RiskProfileEvidence@1
+- StellaOps.PromotionAttestation@1
+
+Signing and storage controls
+- Default is short-lived keyless signing; tenant KMS keys are supported.
+- Ed25519 and ECDSA P-256 are supported.
+- Payloads must exclude PII and secrets; redaction is required before signing.
+- Evidence Locker stores immutable copies with retention and legal hold.
+
+Verification steps
+- Verify DSSE signature against trusted roots.
+- Confirm subject digest matches expected artifact.
+- Verify transparency proof when available.
+- Enforce freshness using attestation.max_age_days policy.
+- Record verification results in timeline events.
+
+Offline posture
+- Bundles include DSSE, transparency proofs, and certificate chains.
+- Offline verification uses embedded proofs and cached trust roots.
+- Pending transparency entries are replayed when connectivity returns.
+
+Related references
+- provenance/inline-provenance.md
+- security/forensics-and-evidence-locker.md
+- docs/modules/attestor/architecture.md
+- docs/modules/signer/architecture.md
+- docs/modules/export-center/architecture.md
--- a/docs2/provenance/backfill.md
+++ b/docs2/provenance/backfill.md
@@ -0,0 +1,24 @@
+# Provenance backfill
+
+Purpose
+- Backfill missing provenance records with deterministic ordering.
+
+Inputs
+- Attestation inventory (NDJSON) with subject and digest data.
+- Subject to Rekor map for resolving transparency entries.
+
+Procedure
+1. Validate inventory records (UUID or ULID and digest formats).
+2. Resolve each subject to a Rekor entry; record gaps and skip if missing.
+3. Emit backfilled provenance events using a backfill mode that preserves ordering.
+4. Log every backfilled subject and Rekor digest pair as NDJSON.
+5. Repeat until gaps are zero and record completion in audit logs.
+
+Determinism
+- Sort by subject then Rekor entry before processing.
+- Use canonical JSON writers and UTC timestamps.
+
+Related references
+- provenance/inline-provenance.md
+- provenance/attestation-workflow.md
+- docs/provenance/prov-backfill-plan.md
--- a/docs2/provenance/rekor-policy.md
+++ b/docs2/provenance/rekor-policy.md
@@ -0,0 +1,34 @@
+# Rekor submission policy
+
+Purpose
+- Balance transparency log usage with budget limits and offline safety.
+
+Submission tiers
+- Tier 1: graph-level attestations per scan (default).
+- Tier 2: edge bundle attestations for escalations.
+
+Budgets
+- Hourly limits for graph submissions.
+- Daily limits for edge bundle submissions.
+- Burst windows for Tier 1 only.
+
+Enforcement
+- Queue excess submissions with backpressure.
+- Retry failed submissions with backoff.
+- Store overflow locally for later submission.
+
+Offline behavior
+- Queue submissions in attestor.rekor_offline_queue.
+- Bundle pending submissions in offline kits.
+- Drain queue when connectivity returns.
+
+Monitoring
+- attestor_rekor_submissions_total
+- attestor_rekor_submission_latency_seconds
+- attestor_rekor_queue_depth
+- attestor_rekor_budget_remaining
+
+Related references
+- provenance/attestation-workflow.md
+- security/crypto-and-trust.md
+- docs/operations/rekor-policy.md
--- a/docs2/release/promotion-attestations.md
+++ b/docs2/release/promotion-attestations.md
@@ -0,0 +1,41 @@
+# Promotion attestations
+
+Purpose
+- Capture promotion-time evidence in a DSSE predicate for offline audit.
+
+Predicate: stella.ops/promotion@v1
+- subject: image name and digest.
+- materials: SBOM and VEX digests with format and OCI uri.
+- promotion: from, to, actor, timestamp, pipeline, ticket, notes.
+- rekor: uuid, logIndex, inclusionProof, checkpoint.
+- attestation: bundle_sha256 and optional witness.
+
+Producer workflow
+1. Resolve and freeze image digest.
+2. Hash SBOM and VEX artifacts and publish to OCI if needed.
+3. Obtain Rekor inclusion proof and checkpoint.
+4. Build promotion predicate JSON.
+5. Sign with Signer to produce DSSE bundle.
+6. Store bundle in Evidence Locker and Export Center.
+
+Verification flow
+- Verify DSSE signature using trusted roots.
+- Verify Merkle inclusion using the embedded proof and checkpoint.
+- Hash SBOM and VEX artifacts and compare to materials digests.
+- Confirm promotion metadata and ticket evidence.
+
+Storage and APIs
+- Signer: /api/v1/signer/sign/dsse
+- Attestor: /api/v1/rekor/entries
+- Export Center: serves promotion bundles for offline kits
+- Evidence Locker: long-term retention of DSSE and proofs
+
+Security considerations
+- Promotion metadata is tenant scoped.
+- Rekor proofs must be embedded for air-gap verification.
+- Key rotation follows Signer and Authority policies.
+
+Related references
+- release/release-engineering.md
+- provenance/attestation-workflow.md
+- security/forensics-and-evidence-locker.md
--- a/docs2/release/release-engineering.md
+++ b/docs2/release/release-engineering.md
@@ -23,6 +23,7 @@ Artifact signing
 - Cosign for containers and bundles
 - DSSE envelopes for attestations
 - Optional Rekor anchoring when available
+- Promotion attestations capture release evidence for offline audit

 Offline update kit (OUK)
 - Monthly bundle of feeds and tooling
@@ -41,3 +42,5 @@ Related references
 - docs/ci/*
 - docs/devops/*
 - docs/release/* and docs/releases/*
+- release/promotion-attestations.md
+- release/release-notes.md
--- a/docs2/release/release-notes.md
+++ b/docs2/release/release-notes.md
@@ -0,0 +1,22 @@
+# Release notes and templates
+
+Release notes
+- Historical release notes live under docs/releases/.
+- Use release notes for time-specific changes; refer to docs2 for current behavior.
+
+Determinism snippet template
+- Use a deterministic score summary in release notes when publishing scans.
+
+Template
+```
+- Determinism score: {{overall_score}} (threshold {{overall_min}})
+  - {{image_digest}} score {{score}} ({{identical}}/{{runs}} identical)
+- Inputs: policy {{policy_sha}}, feeds {{feeds_sha}}, scanner {{scanner_sha}}, platform {{platform}}
+- Evidence: determinism.json and artifact hashes (DSSE signed, offline ready)
+- Actions: rerun stella detscore run --bundle determinism.json if score < threshold
+```
+
+Related references
+- release/release-engineering.md
+- operations/replay-and-determinism.md
+- docs/release/templates/determinism-score.md
--- a/docs2/risk/api.md
+++ b/docs2/risk/api.md
@@ -0,0 +1,36 @@
+# Risk API
+
+Purpose
+- Expose risk jobs, profiles, simulations, explainability, and exports.
+
+Endpoints (v1)
+- POST /api/v1/risk/jobs: submit scoring job.
+- GET /api/v1/risk/jobs/{job_id}: job status and results.
+- GET /api/v1/risk/explain/{job_id}: explainability payload.
+- GET /api/v1/risk/profiles: list profiles with hashes and versions.
+- POST /api/v1/risk/profiles: create or update profiles with DSSE metadata.
+- POST /api/v1/risk/simulations: dry-run scoring with fixtures.
+- GET /api/v1/risk/export/{job_id}: export bundle for audit.
+
+Auth and tenancy
+- Headers: X-Stella-Tenant, Authorization Bearer token.
+- Optional X-Stella-Scope for imposed rule reminders.
+
+Error model
+- Envelope: code, message, correlation_id, severity, remediation.
+- Rate-limit headers: Retry-After, X-RateLimit-Remaining.
+- ETag headers for profile and explain responses.
+
+Feature flags
+- risk.jobs, risk.explain, risk.simulations, risk.export.
+
+Determinism and offline
+- Samples in docs/risk/samples/api/ with SHA256SUMS.
+- Stable field ordering and UTC timestamps.
+
+Related references
+- risk/overview.md
+- risk/profiles.md
+- risk/factors.md
+- risk/formulas.md
+- risk/explainability.md
--- a/docs2/risk/explainability.md
+++ b/docs2/risk/explainability.md
@@ -0,0 +1,28 @@
+# Risk explainability
+
+Purpose
+- Provide per-factor contributions with provenance and gating rationale.
+
+Explainability envelope
+- job_id, tenant_id, context_id
+- profile_id, profile_version, profile_hash
+- finding_id, raw_score, normalized_score, severity
+- signal_values and signal_contributions
+- override_applied, override_reason, gates_triggered
+- scored_at and provenance hashes
+
+UI and CLI expectations
+- Deterministic ordering by factor type, source, then timestamp.
+- Highlight top contributors and gates.
+- Export Center bundles include explain payload and manifest hashes.
+
+Determinism and offline
+- Fixtures under docs/risk/samples/explain/ with SHA256SUMS.
+- No live calls in examples or captures.
+
+Related references
+- risk/overview.md
+- risk/factors.md
+- risk/formulas.md
+- risk/profiles.md
+- risk/api.md
--- a/docs2/risk/factors.md
+++ b/docs2/risk/factors.md
@@ -0,0 +1,29 @@
+# Risk factors
+
+Purpose
+- Define factor catalog and normalization rules for risk scoring.
+
+Factor catalog (examples)
+- CVSS or exploit likelihood: numeric 0-10 normalized to 0-1.
+- KEV flag: boolean boost with provenance.
+- Reachability: numeric with entrypoint and path provenance.
+- Runtime facts: categorical or numeric with trace references.
+- Fix availability: vendor status and mitigation context.
+- Asset criticality: tenant or service criticality signals.
+- Provenance trust: categorical trust tier with attestation hash.
+- Custom overrides: scoped, expiring, and auditable.
+
+Normalization rules
+- Validate against profile signal types and transforms.
+- Clamp numeric inputs to 0-1 and record original values in provenance.
+- Apply TTL or decay deterministically; drop expired signals.
+- Precedence: signed over unsigned, runtime over static, newer over older.
+
+Determinism and ordering
+- Sort factors by factor type, source, then timestamp.
+- Hash fixtures and record SHA256 in docs/risk/samples/factors/.
+
+Related references
+- risk/overview.md
+- risk/formulas.md
+- risk/profiles.md
--- a/docs2/risk/formulas.md
+++ b/docs2/risk/formulas.md
@@ -0,0 +1,28 @@
+# Risk formulas
+
+Purpose
+- Define how normalized factors combine into a risk score and severity.
+
+Formula building blocks
+- Weighted sum with per-factor caps and family caps.
+- Normalize raw score to 0-1 and apply gates.
+- VEX gate: not_affected can short-circuit to 0.0.
+- CVSS + KEV boost: clamp01((cvss/10) + kev_bonus).
+- Trust gates: fail or down-weight low-trust provenance.
+- Decay: apply time-based decay to stale signals.
+- Overrides: tenant or asset overrides with expiry and audit.
+
+Severity mapping
+- Map normalized_score to critical, high, medium, low, informational.
+- Store band rationale in explainability output.
+
+Determinism
+- Stable factor ordering before aggregation.
+- Fixed precision (example: 4 decimals) before severity mapping.
+- Hash fixtures and record SHA256 in docs/risk/samples/formulas/.
+
+Related references
+- risk/overview.md
+- risk/factors.md
+- risk/profiles.md
+- risk/explainability.md
--- a/docs2/risk/overview.md
+++ b/docs2/risk/overview.md
@@ -0,0 +1,36 @@
+# Risk overview
+
+Purpose
+- Explain risk scoring concepts, lifecycle, and artifacts.
+- Preserve deterministic, provenance-backed outputs.
+
+Core concepts
+- Signals become evidence after validation and normalization.
+- Profiles define weights, thresholds, overrides, and severity mapping.
+- Formulas aggregate normalized factors into a 0-1 score.
+- Provenance carries source hashes and attestation references.
+
+Lifecycle
+1. Submit a risk job with tenant, context, profile, and findings.
+2. Ingest evidence from scanners, reachability, VEX, runtime signals, and KEV.
+3. Normalize and dedupe by provenance hash.
+4. Evaluate profile rules, gates, and overrides.
+5. Assign severity band and emit explainability output.
+6. Export bundles with profile hash and evidence references.
+
+Artifacts
+- Profile schema: id, version, signals, weights, overrides, metadata, provenance.
+- Job and result fields: job_id, profile_hash, normalized_score, severity.
+- Explainability envelope: signal_values, signal_contributions, gates_triggered.
+
+Determinism and offline posture
+- Stable ordering for factors and contributions.
+- Fixed precision math with UTC timestamps only.
+- Fixtures and hashes live under docs/risk/samples/.
+
+Related references
+- risk/factors.md
+- risk/formulas.md
+- risk/profiles.md
+- risk/explainability.md
+- risk/api.md
--- a/docs2/risk/profiles.md
+++ b/docs2/risk/profiles.md
@@ -0,0 +1,37 @@
+# Risk profiles
+
+Purpose
+- Define profile schema, lifecycle, and governance for risk scoring.
+
+Schema essentials
+- id, version, description, signals[], weights, metadata.
+- signals[] fields: name, source, type (numeric, boolean, categorical), path, transform, unit.
+- overrides: severity rules and decision rules.
+- Optional: extends, rollout flags, valid_from, valid_until.
+
+Severity levels
+- critical, high, medium, low, informational.
+
+Lifecycle
+1. Author profiles in Policy Studio.
+2. Simulate against deterministic fixtures.
+3. Review and approve with DSSE signatures.
+4. Promote and activate in Policy Engine.
+5. Roll back by activating a previous version.
+
+Governance and determinism
+- Profiles are immutable after promotion.
+- Each version carries a profile_hash and signed manifest entry.
+- Simulation and production share the same evaluation codepath.
+- Offline bundles include profiles and fixtures with hashes.
+
+Explainability and observability
+- Emit per-factor contributions with stable ordering.
+- Track evaluation latency, factor coverage, profile hit rate, and override usage.
+
+Related references
+- risk/overview.md
+- risk/factors.md
+- risk/formulas.md
+- risk/explainability.md
+- risk/api.md
--- a/docs2/security/crypto-and-trust.md
+++ b/docs2/security/crypto-and-trust.md
@@ -32,3 +32,6 @@ Related references
 - docs/security/crypto-simulation-services.md
 - docs/security/crypto-compliance.md
 - docs/airgap/staleness-and-time.md
+- operations/key-rotation.md
+- provenance/rekor-policy.md
+- release/promotion-attestations.md
--- a/docs2/security/evidence-locker-publishing.md
+++ b/docs2/security/evidence-locker-publishing.md
@@ -0,0 +1,30 @@
+# Evidence locker publishing
+
+Purpose
+- Publish deterministic evidence bundles to the Evidence Locker.
+
+Required inputs
+- Evidence locker base URL (no trailing slash).
+- Bearer token with write scopes for required prefixes.
+- Signing key for final bundle signing (Cosign key or key file).
+
+Publishing flow
+- Build deterministic tar bundles for each producer (signals, runtime, evidence packs).
+- Verify bundle hashes and inner SHA256 lists before upload.
+- Upload bundles to the Evidence Locker using the configured token.
+- Re-sign bundles with production keys when required.
+
+Deterministic packaging rules
+- tar --sort=name
+- fixed mtime (UTC 1970-01-01)
+- owner and group set to 0
+- numeric-owner enabled
+
+Offline posture
+- Transparency log upload may be disabled in sealed mode.
+- Trust derives from local keys and recorded hashes.
+- Upload scripts must fail on hash mismatch.
+
+Related references
+- security/forensics-and-evidence-locker.md
+- provenance/attestation-workflow.md
--- a/docs2/security/forensics-and-evidence-locker.md
+++ b/docs2/security/forensics-and-evidence-locker.md
@@ -28,7 +28,8 @@ Minimum bundle layout
 - signatures/ for DSSE or sigstore bundles

 Related references
+- provenance/attestation-workflow.md
+- security/timeline.md
+- security/evidence-locker-publishing.md
 - docs/forensics/evidence-locker.md
- docs/forensics/provenance-attestation.md
- docs/forensics/timeline.md
 - docs/evidence-locker/evidence-pack-schema.md
--- a/docs2/security/multi-tenancy.md
+++ b/docs2/security/multi-tenancy.md
@@ -0,0 +1,27 @@
+# Multi-tenancy
+
+Purpose
+- Ensure strict tenant isolation across APIs, storage, and observability.
+
+Tenant lifecycle
+- Create tenants with scoped roles and default policies.
+- Suspend or retire tenants with audit records.
+- Migrations and data retention follow governance policy.
+
+Isolation model
+- Tokens carry tenant identifiers and scopes.
+- APIs require tenant headers; cross-tenant actions are explicit.
+- Datastores enforce tenant_id and RLS where supported.
+
+Observability
+- Metrics, logs, and traces always include tenant.
+- Cross-tenant access attempts emit audit events.
+
+Offline posture
+- Offline bundles are tenant scoped.
+- Tenant list in offline mode is limited to snapshot contents.
+
+Related references
+- security/identity-tenancy-and-scopes.md
+- security/row-level-security.md
+- docs/operations/multi-tenancy.md
--- a/docs2/security/risk-model.md
+++ b/docs2/security/risk-model.md
@@ -40,3 +40,9 @@ Related references
 - docs/risk/profiles.md
 - docs/risk/api.md
 - docs/guides/epss-integration.md
+- risk/overview.md
+- risk/factors.md
+- risk/formulas.md
+- risk/profiles.md
+- risk/explainability.md
+- risk/api.md
--- a/docs2/security/row-level-security.md
+++ b/docs2/security/row-level-security.md
@@ -0,0 +1,21 @@
+# Row-level security
+
+Purpose
+- Enforce tenant isolation at the database level with RLS policies.
+
+Strategy
+- Apply RLS to tenant-scoped tables and views.
+- Require app.tenant_id session setting on every connection.
+- Deny access when tenant context is missing.
+
+Policy evaluation
+- Policies filter rows by tenant_id and optional scope.
+- Admin bypass uses explicit roles with audited access.
+
+Validation
+- Run cross-tenant read and write tests in staging.
+- Include RLS checks in deterministic replay suites.
+
+Related references
+- data/postgresql-patterns.md
+- docs/operations/rls-and-data-isolation.md
--- a/docs2/security/timeline.md
+++ b/docs2/security/timeline.md
@@ -0,0 +1,47 @@
+# Timeline forensics
+
+Purpose
+- Provide an append-only event ledger for audit, replay, and incident analysis.
+- Support deterministic exports for offline review.
+
+Event model
+- event_id (ULID)
+- tenant
+- timestamp (UTC ISO-8601)
+- category (scanner, policy, runtime, evidence, notify)
+- details (JSON payload)
+- trace_id for correlation
+
+Event kinds
+- scan.completed
+- policy.verdict
+- attestation.verified
+- evidence.ingested
+- notify.sent
+- runtime.alert
+- redaction_notice (compensating event)
+
+APIs
+- GET /api/v1/timeline/events with filters for tenant, category, time window, trace_id.
+- GET /api/v1/timeline/events/{id} for a single event.
+- GET /api/v1/timeline/export for NDJSON exports.
+- Headers: X-Stella-Tenant, optional X-Stella-TraceId, If-None-Match.
+
+Query guidance
+- Use category plus trace_id to track scan to policy to notify flow.
+- Use tenant and timestamp ranges for SLA audits.
+- CLI parity: stella timeline list mirrors the API.
+
+Retention and redaction
+- Append-only storage; no deletes.
+- Redactions use redaction_notice events that reference the superseded event.
+- Retention is tenant-configurable and exported weekly to cold storage.
+
+Offline posture
+- Offline kits include timeline exports for compliance review.
+- Exports include stable ordering and manifest hashes.
+
+Related references
+- security/forensics-and-evidence-locker.md
+- observability.md
+- docs/forensics/timeline.md
--- a/docs2/signals/uncertainty.md
+++ b/docs2/signals/uncertainty.md
@@ -10,15 +10,37 @@ Core states (examples)
 - U4: Unknown (no analysis yet)

 Tiers and scoring
- Tiers group states by entropy ranges.
- The aggregate tier is the maximum severity present.
- Risk score adds an entropy-based modifier.
+- Tiers group states by entropy ranges (T1 high to T4 negligible).
+- Aggregate tier is the maximum tier across states.
+- Risk score adds tier and entropy modifiers.
+
+Tier ranges (example)
+- T1: 0.7 to 1.0, blocks not_affected.
+- T2: 0.4 to 0.69, warns on not_affected.
+- T3: 0.1 to 0.39, allow with caveat.
+- T4: 0.0 to 0.09, no special handling.
+
+Risk score formula (simplified)
+- meanEntropy = avg(states[].entropy)
+- entropyBoost = clamp(meanEntropy * k, 0..boostCeiling)
+- tierModifier = {T1:0.50, T2:0.25, T3:0.10, T4:0.00}[aggregateTier]
+- riskScore = clamp(baseScore * (1 + tierModifier + entropyBoost), 0..1)

 Policy guidance
 - High uncertainty blocks not_affected claims.
 - Lower tiers allow decisions with caveats.
 - Remediation hints are attached to findings.

+Remediation examples
+- U1: upload symbols or resolve unknowns registry.
+- U2: generate lockfile and resolve package coordinates.
+- U3: cross-reference trusted advisories.
+- U4: run initial analysis to remove unknown state.
+
+Payload fields
+- states[] include code, name, entropy, tier, timestamp, evidence.
+- aggregateTier and riskScore recorded with computedAt timestamp.
+
 Determinism rules
 - Stable ordering of uncertainty states.
 - UTC timestamps and fixed precision for entropy values.
--- a/docs2/testing-and-quality.md
+++ b/docs2/testing-and-quality.md
@@ -17,3 +17,6 @@
 - Interop checks against external tooling formats.
 - Offline E2E runs as a release gate.
 - Policy and schema validation in CI.
+
+Related references
+- testing/router-chaos.md
--- a/docs2/testing/router-chaos.md
+++ b/docs2/testing/router-chaos.md
@@ -0,0 +1,34 @@
+# Router chaos testing
+
+Purpose
+- Validate backpressure, recovery, and cache failure behavior for the router.
+
+Test categories
+- Load testing with spike scenarios (baseline, 10x, 50x, recovery).
+- Backpressure verification for 429 and 503 with Retry-After.
+- Recovery tests to ensure queues drain quickly.
+- Valkey failure injection with graceful fallback.
+
+Expected behavior
+- Normal load returns 200 OK.
+- High load returns 429 with Retry-After.
+- Critical load returns 503 with Retry-After.
+- Recovery within 30 seconds, zero data loss.
+
+Metrics
+- http_requests_total{status}
+- router_request_queue_depth
+- request_recovery_seconds
+
+Alert cues
+- Throttle rate above 10% for 5 minutes.
+- P95 recovery time above 30 seconds.
+- Missing Retry-After headers.
+
+CI integration
+- Runs on PRs touching router code and nightly staging runs.
+- Stores results as artifacts for audits.
+
+Related references
+- operations/router-rate-limiting.md
+- docs/operations/router-chaos-testing-runbook.md
--- a/docs2/topic-map.md
+++ b/docs2/topic-map.md
@@ -18,6 +18,10 @@ Architecture and system model
  docs/modules/platform/architecture-overview.md, docs/modules/*/architecture.md
 - Docs2: architecture/overview.md, architecture/workflows.md, modules/index.md

+Advisory alignment
+- Sources: docs/architecture/advisory-alignment-report.md
+- Docs2: architecture/advisory-alignment.md
+
 Component map
 - Sources: docs/technical/architecture/component-map.md
 - Docs2: architecture/component-map.md
@@ -77,7 +81,7 @@ Advisory AI
 Orchestrator detail
 - Sources: docs/orchestrator/*
 - Docs2: orchestrator/overview.md, orchestrator/architecture.md, orchestrator/api.md,
-  orchestrator/cli.md, orchestrator/console.md
+  orchestrator/cli.md, orchestrator/console.md, orchestrator/runbook.md

 Orchestrator run ledger
 - Sources: docs/orchestrator/run-ledger.md
@@ -118,7 +122,10 @@ Replay and determinism

 Runbooks and incident response
 - Sources: docs/runbooks/*, docs/operations/*
- Docs2: operations/runbooks.md
+- Docs2: operations/runbooks.md, operations/key-rotation.md,
+  operations/proof-verification.md, operations/score-proofs.md,
+  operations/reachability.md, operations/trust-lattice.md,
+  operations/unknowns-queue.md

 Notifications
 - Sources: docs/notifications/*, docs/modules/notify/*
@@ -129,7 +136,8 @@ Notifications details
  docs/notifications/channels.md, docs/notifications/templates.md,
  docs/notifications/digests.md, docs/notifications/pack-approvals-integration.md
 - Docs2: notifications/overview.md, notifications/rules.md, notifications/channels.md,
-  notifications/templates.md, notifications/digests.md, notifications/pack-approvals.md
+  notifications/templates.md, notifications/digests.md, notifications/pack-approvals.md,
+  notifications/runbook.md

 Router rate limiting
 - Sources: docs/router/*
@@ -138,7 +146,8 @@ Router rate limiting
 Release engineering and CI/DevOps
 - Sources: docs/13_RELEASE_ENGINEERING_PLAYBOOK.md, docs/ci/*, docs/devops/*,
  docs/release/*, docs/releases/*
- Docs2: release/release-engineering.md
+- Docs2: release/release-engineering.md, release/promotion-attestations.md,
+  release/release-notes.md

 API and contracts
 - Sources: docs/09_API_CLI_REFERENCE.md, docs/api/*, docs/schemas/*,
@@ -177,7 +186,8 @@ Regulator threat and evidence model
 Identity, tenancy, and scopes
 - Sources: docs/security/authority-scopes.md, docs/security/scopes-and-roles.md,
  docs/architecture/console-admin-rbac.md
- Docs2: security/identity-tenancy-and-scopes.md
+- Docs2: security/identity-tenancy-and-scopes.md, security/multi-tenancy.md,
+  security/row-level-security.md

 Console admin RBAC
 - Sources: docs/architecture/console-admin-rbac.md
@@ -213,20 +223,26 @@ Quota and licensing

 Risk model and scoring
 - Sources: docs/risk/*, docs/contracts/risk-scoring.md
- Docs2: security/risk-model.md
+- Docs2: security/risk-model.md, risk/overview.md, risk/factors.md, risk/formulas.md,
+  risk/profiles.md, risk/explainability.md, risk/api.md

 Forensics and evidence locker
- Sources: docs/forensics/*, docs/evidence-locker/*
- Docs2: security/forensics-and-evidence-locker.md
+- Sources: docs/forensics/*, docs/evidence-locker/*, docs/ops/evidence-locker-handoff.md
+- Docs2: security/forensics-and-evidence-locker.md, security/evidence-locker-publishing.md
+
+Timeline forensics
+- Sources: docs/forensics/timeline.md
+- Docs2: security/timeline.md

 Provenance and transparency
 - Sources: docs/provenance/*, docs/security/trust-and-signing.md,
  docs/modules/attestor/*, docs/modules/signer/*
- Docs2: provenance/inline-provenance.md
+- Docs2: provenance/inline-provenance.md, provenance/attestation-workflow.md,
+  provenance/rekor-policy.md, provenance/backfill.md

 Database and persistence
 - Sources: docs/db/*, docs/adr/0001-postgresql-for-control-plane.md
- Docs2: data/persistence.md
+- Docs2: data/persistence.md, data/postgresql-operations.md, data/postgresql-patterns.md

 Events and messaging
 - Sources: docs/events/*, docs/samples/*
@@ -334,19 +350,22 @@ Vuln Explorer overview

 Testing and quality
 - Sources: docs/19_TEST_SUITE_OVERVIEW.md, docs/testing/*
- Docs2: testing-and-quality.md
+- Docs2: testing-and-quality.md, testing/router-chaos.md

 Observability and telemetry
 - Sources: docs/metrics/*, docs/observability/*, docs/modules/telemetry/*,
  docs/technical/observability/*
- Docs2: observability.md
+- Docs2: observability.md, observability-standards.md, observability-logging.md,
+  observability-tracing.md, observability-metrics-slos.md, observability-telemetry-controls.md,
+  observability-aoc.md, observability-aggregation.md, observability-policy.md,
+  observability-ui-telemetry.md, observability-vuln-telemetry.md

 Benchmarks and performance
 - Sources: docs/benchmarks/*, docs/12_PERFORMANCE_WORKBOOK.md
 - Docs2: benchmarks.md

 Guides and workflows
- Sources: docs/guides/*, docs/ci/sarif-integration.md
+- Sources: docs/guides/*, docs/ci/sarif-integration.md, docs/architecture/epss-versioning-clarification.md
 - Docs2: guides/compare-workflow.md, guides/epss-integration.md

 Examples and fixtures