Add tests for SBOM generation determinism across multiple formats
- Created `StellaOps.TestKit.Tests` project for unit tests related to determinism. - Implemented `DeterminismManifestTests` to validate deterministic output for canonical bytes and strings, file read/write operations, and error handling for invalid schema versions. - Added `SbomDeterminismTests` to ensure identical inputs produce consistent SBOMs across SPDX 3.0.1 and CycloneDX 1.6/1.7 formats, including parallel execution tests. - Updated project references in `StellaOps.Integration.Determinism` to include the new determinism testing library.
This commit is contained in:
@@ -36,6 +36,7 @@ How to navigate
|
||||
- orchestrator/api.md - Orchestrator API surface
|
||||
- orchestrator/cli.md - Orchestrator CLI commands
|
||||
- orchestrator/console.md - Orchestrator console views
|
||||
- orchestrator/runbook.md - Orchestrator operations runbook
|
||||
- operations/quickstart.md - First scan workflow
|
||||
- operations/install-deploy.md - Install and deployment guidance
|
||||
- operations/deployment-versioning.md - Versioning and promotion model
|
||||
@@ -47,6 +48,12 @@ How to navigate
|
||||
- operations/runtime-readiness.md - Runtime readiness checks
|
||||
- operations/slo.md - Service SLO overview
|
||||
- operations/runbooks.md - Operational runbooks and incident response
|
||||
- operations/key-rotation.md - Signing key rotation runbook
|
||||
- operations/proof-verification.md - Proof verification runbook
|
||||
- operations/score-proofs.md - Score proofs and replay operations
|
||||
- operations/reachability.md - Reachability operations
|
||||
- operations/trust-lattice.md - Trust lattice operations
|
||||
- operations/unknowns-queue.md - Unknowns queue operations
|
||||
- operations/notifications.md - Notifications Studio operations
|
||||
- notifications/overview.md - Notifications overview
|
||||
- notifications/rules.md - Notification rules and routing
|
||||
@@ -54,8 +61,11 @@ How to navigate
|
||||
- notifications/templates.md - Notification templates
|
||||
- notifications/digests.md - Notification digests
|
||||
- notifications/pack-approvals.md - Pack approval notifications
|
||||
- notifications/runbook.md - Notifications operations runbook
|
||||
- operations/router-rate-limiting.md - Gateway rate limiting
|
||||
- release/release-engineering.md - Release and CI/CD overview
|
||||
- release/promotion-attestations.md - Promotion-time attestation predicate
|
||||
- release/release-notes.md - Release notes index and templates
|
||||
- api/overview.md - API surface and conventions
|
||||
- api/auth-and-tokens.md - Authority, OpTok, DPoP and mTLS, PoE
|
||||
- policy/policy-system.md - Policy DSL, lifecycle, and governance
|
||||
@@ -99,12 +109,16 @@ How to navigate
|
||||
- ui/branding.md - Tenant branding model
|
||||
- data-and-schemas.md - Storage, schemas, and determinism rules
|
||||
- data/persistence.md - Database model and migration notes
|
||||
- data/postgresql-operations.md - PostgreSQL operations guide
|
||||
- data/postgresql-patterns.md - RLS and partitioning patterns
|
||||
- data/events.md - Event envelopes and validation
|
||||
- sbom/overview.md - SBOM formats, mapping, and heuristics
|
||||
- governance/approvals.md - Approval routing and audit
|
||||
- governance/exceptions.md - Exception lifecycle and controls
|
||||
- security-and-governance.md - Security policy, hardening, governance, compliance
|
||||
- security/identity-tenancy-and-scopes.md - Authority scopes and tenancy rules
|
||||
- security/multi-tenancy.md - Tenant lifecycle and isolation model
|
||||
- security/row-level-security.md - Database RLS enforcement
|
||||
- security/crypto-and-trust.md - Crypto profiles and trust roots
|
||||
- security/crypto-compliance.md - Regional crypto profiles and licensing notes
|
||||
- security/quota-and-licensing.md - Offline quota and JWT licensing
|
||||
@@ -114,8 +128,19 @@ How to navigate
|
||||
- security/audit-events.md - Authority audit event schema
|
||||
- security/revocation-bundles.md - Revocation bundle format and verification
|
||||
- security/risk-model.md - Risk scoring model and explainability
|
||||
- risk/overview.md - Risk scoring overview
|
||||
- risk/factors.md - Risk factor catalog
|
||||
- risk/formulas.md - Risk scoring formulas
|
||||
- risk/profiles.md - Risk profile schema and lifecycle
|
||||
- risk/explainability.md - Risk explainability payloads
|
||||
- risk/api.md - Risk API endpoints
|
||||
- security/forensics-and-evidence-locker.md - Evidence locker and forensic storage
|
||||
- security/evidence-locker-publishing.md - Evidence locker publishing process
|
||||
- security/timeline.md - Timeline event ledger and exports
|
||||
- provenance/inline-provenance.md - DSSE metadata and transparency links
|
||||
- provenance/attestation-workflow.md - Attestation workflow and verification
|
||||
- provenance/rekor-policy.md - Rekor submission budget policy
|
||||
- provenance/backfill.md - Provenance backfill procedure
|
||||
- signals/unknowns.md - Unknowns registry and signals model
|
||||
- signals/unknowns-ranking.md - Unknowns scoring and triage bands
|
||||
- signals/uncertainty.md - Uncertainty states and tiers
|
||||
@@ -129,7 +154,18 @@ How to navigate
|
||||
- migration/overview.md - Migration paths and parity guidance
|
||||
- vex/consensus.md - VEX consensus overview
|
||||
- testing-and-quality.md - Test strategy and quality gates
|
||||
- testing/router-chaos.md - Router chaos testing scenarios
|
||||
- observability.md - Metrics, logs, tracing, telemetry stack
|
||||
- observability-standards.md - Telemetry envelope, scrubbing, sampling
|
||||
- observability-logging.md - Logging fields and redaction
|
||||
- observability-tracing.md - Trace propagation and span conventions
|
||||
- observability-metrics-slos.md - Core metrics and SLO guidance
|
||||
- observability-telemetry-controls.md - Propagation, sealed mode, incident mode
|
||||
- observability-aoc.md - AOC ingestion observability
|
||||
- observability-aggregation.md - Aggregation pipeline observability
|
||||
- observability-policy.md - Policy Engine observability
|
||||
- observability-ui-telemetry.md - Console telemetry metrics and alerts
|
||||
- observability-vuln-telemetry.md - Vulnerability explorer telemetry
|
||||
- developer/onboarding.md - Local dev setup and workflows
|
||||
- developer/plugin-sdk.md - Plugin SDK summary
|
||||
- developer/devportal.md - Developer portal publishing
|
||||
|
||||
@@ -7,6 +7,11 @@ Envelope types
|
||||
- Orchestrator events: versioned envelopes with idempotency keys and trace context.
|
||||
- Legacy Redis envelopes: transitional schemas used for older consumers.
|
||||
|
||||
Event catalog (examples)
|
||||
- scanner.event.report.ready@1 and scanner.event.scan.completed@1 (orchestrator envelopes).
|
||||
- scanner.report.ready@1 and scanner.scan.completed@1 (legacy Redis envelopes).
|
||||
- scheduler.rescan.delta@1, scheduler.graph.job.completed@1, attestor.logged@1.
|
||||
|
||||
Orchestrator envelope fields (v1)
|
||||
- eventId, kind, version, tenant
|
||||
- occurredAt, recordedAt
|
||||
@@ -26,6 +31,8 @@ Versioning rules
|
||||
Validation
|
||||
- Schemas and samples live under docs/events/ and docs/events/samples/.
|
||||
- Offline validation uses ajv-cli; keep schema checks deterministic.
|
||||
- Validate schemas with ajv compile and validate samples against matching schemas.
|
||||
- Add new samples for each new schema version.
|
||||
|
||||
Related references
|
||||
- docs/events/README.md
|
||||
|
||||
@@ -32,3 +32,5 @@ Migration notes
|
||||
Related references
|
||||
- ADR: docs/adr/0001-postgresql-for-control-plane.md
|
||||
- Module architecture: docs/modules/*/architecture.md
|
||||
- data/postgresql-operations.md
|
||||
- data/postgresql-patterns.md
|
||||
|
||||
36
docs2/data/postgresql-operations.md
Normal file
36
docs2/data/postgresql-operations.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# PostgreSQL operations
|
||||
|
||||
Purpose
|
||||
- Operate the canonical PostgreSQL control plane with deterministic behavior.
|
||||
|
||||
Schema topology
|
||||
- Per-module schemas: authority, vuln, vex, scheduler, notify, policy, concelier, audit.
|
||||
- Tenant isolation enforced via tenant_id and RLS policies.
|
||||
|
||||
Performance setup
|
||||
- Enable pg_stat_statements for query analysis.
|
||||
- Tune shared_buffers, effective_cache_size, work_mem, and WAL sizes per host.
|
||||
- Use PgBouncer in transaction pooling mode for high concurrency.
|
||||
|
||||
Session defaults
|
||||
- SET app.tenant_id per connection.
|
||||
- SET timezone to UTC.
|
||||
- Enforce statement_timeout for long-running queries.
|
||||
|
||||
Query analysis
|
||||
- Use pg_stat_statements to find high total and high mean latency queries.
|
||||
- Use EXPLAIN ANALYZE with BUFFERS to detect missing indexes.
|
||||
|
||||
Backups and restore
|
||||
- Use scheduled logical or physical backups with tested restore paths.
|
||||
- Keep PITR capability where required by retention policies.
|
||||
- Validate backups with deterministic restore tests.
|
||||
|
||||
Monitoring
|
||||
- Track connection count, replication lag, and slow query rates.
|
||||
- Alert on pool saturation and replication delays.
|
||||
|
||||
Related references
|
||||
- data/postgresql-patterns.md
|
||||
- data/persistence.md
|
||||
- docs/operations/postgresql-guide.md
|
||||
33
docs2/data/postgresql-patterns.md
Normal file
33
docs2/data/postgresql-patterns.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# PostgreSQL patterns
|
||||
|
||||
Row-level security (RLS)
|
||||
- Require tenant context via app.tenant_id session setting.
|
||||
- Policies filter by tenant_id on all tenant-scoped tables.
|
||||
- Admin operations use explicit bypass roles and audited access.
|
||||
|
||||
Validating RLS
|
||||
- Run staging tests that attempt cross-tenant reads and writes.
|
||||
- Use deterministic replay tests for RLS regressions.
|
||||
|
||||
Bitemporal unknowns
|
||||
- Store current and historical states with valid_from and valid_to.
|
||||
- Support point-in-time queries and deterministic ordering.
|
||||
|
||||
Time-based partitioning
|
||||
- Partition high-volume tables by time.
|
||||
- Pre-create future partitions and archive old partitions.
|
||||
- Use deterministic maintenance checklists for partition health.
|
||||
|
||||
Generated columns
|
||||
- Use generated columns for derived flags and query optimization.
|
||||
- Add columns via migrations and backfill deterministically.
|
||||
|
||||
Troubleshooting
|
||||
- RLS failures: verify tenant context and policy attachment.
|
||||
- Partition issues: check missing partitions and default tables.
|
||||
- Bitemporal queries: confirm valid time windows and index usage.
|
||||
|
||||
Related references
|
||||
- data/postgresql-operations.md
|
||||
- security/multi-tenancy.md
|
||||
- docs/operations/postgresql-patterns-runbook.md
|
||||
@@ -22,3 +22,4 @@ Related references
|
||||
- docs/notifications/overview.md
|
||||
- docs/notifications/architecture.md
|
||||
- docs2/operations/notifications.md
|
||||
- notifications/runbook.md
|
||||
|
||||
40
docs2/notifications/runbook.md
Normal file
40
docs2/notifications/runbook.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Notifications runbook
|
||||
|
||||
Purpose
|
||||
- Deploy and operate the Notifications WebService and Worker.
|
||||
|
||||
Pre-flight
|
||||
- Secrets stored in Authority (SMTP, Slack, webhook HMAC).
|
||||
- Outbound allowlist configured for channels.
|
||||
- PostgreSQL and Valkey reachable; health checks pass.
|
||||
- Offline kit loaded with templates and rule seeds.
|
||||
|
||||
Deploy
|
||||
- Deploy images with digests pinned.
|
||||
- Set Notify Postgres, Redis, Authority, and allowlist settings.
|
||||
- Warm caches via /api/v1/notify/admin/warm when needed.
|
||||
|
||||
Monitor
|
||||
- notify_delivery_attempts_total by status and channel.
|
||||
- notify_escalation_stage_total and notify_rule_eval_seconds.
|
||||
- Logs include tenant, ruleId, deliveryId, channel, status.
|
||||
|
||||
Common operations
|
||||
- List failed deliveries and replay.
|
||||
- Pause a tenant without dropping audit events.
|
||||
- Rotate channel secrets via refresh endpoints.
|
||||
|
||||
Failure recovery
|
||||
- Validate templates and Redis connectivity for worker crashes.
|
||||
- Replay deliveries after database recovery.
|
||||
- Disable channels during upstream outages.
|
||||
|
||||
Determinism safeguards
|
||||
- Rule snapshots versioned per tenant.
|
||||
- Template rendering uses deterministic helpers.
|
||||
- UTC time sources for quiet hours.
|
||||
|
||||
Related references
|
||||
- notifications/overview.md
|
||||
- notifications/rules.md
|
||||
- docs/operations/notifier-runbook.md
|
||||
34
docs2/observability-aggregation.md
Normal file
34
docs2/observability-aggregation.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Aggregation observability
|
||||
|
||||
Purpose
|
||||
- Track Link-Not-Merge aggregation and overlay pipelines.
|
||||
|
||||
Metrics
|
||||
- aggregation_ingest_latency_seconds{tenant,source,status}
|
||||
- aggregation_conflict_total{tenant,advisory,product,reason}
|
||||
- aggregation_overlay_cache_hits_total, aggregation_overlay_cache_misses_total
|
||||
- aggregation_vex_gate_total{tenant,status}
|
||||
- aggregation_queue_depth{tenant}
|
||||
|
||||
Traces
|
||||
- Span: aggregation.process
|
||||
- Attributes: tenant, advisory, product, vex_status, source_kind, overlay_version, cache_hit
|
||||
|
||||
Logs
|
||||
- tenant, advisory, product, vex_status
|
||||
- decision (merged, suppressed, dropped)
|
||||
- reason, duration_ms, trace_id
|
||||
|
||||
SLOs
|
||||
- Ingest latency p95 < 500ms per statement.
|
||||
- Overlay cache hit rate > 80%.
|
||||
- Error rate < 0.1% over 10 minutes.
|
||||
|
||||
Alerts
|
||||
- HighConflictRate: aggregation_conflict_total delta > 100 per minute.
|
||||
- QueueBacklog: aggregation_queue_depth > 10k for 5 minutes.
|
||||
- LowCacheHit: cache hit rate < 60% for 10 minutes.
|
||||
|
||||
Offline posture
|
||||
- Export metrics to local Prometheus scrape.
|
||||
- Deterministic ordering preserved; cache warmers seeded from bundled fixtures.
|
||||
49
docs2/observability-aoc.md
Normal file
49
docs2/observability-aoc.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# AOC observability
|
||||
|
||||
Purpose
|
||||
- Monitor Aggregation-Only ingestion for Concelier and Excititor.
|
||||
- Provide deterministic metrics, traces, and logs for AOC guardrails.
|
||||
|
||||
Core metrics
|
||||
- ingestion_write_total{source,tenant,result}
|
||||
- ingestion_latency_seconds{source,tenant,phase}
|
||||
- aoc_violation_total{source,tenant,code}
|
||||
- ingestion_signature_verified_total{source,tenant,result}
|
||||
- advisory_revision_count{source,tenant}
|
||||
- verify_runs_total{tenant,initiator}
|
||||
- verify_duration_seconds{tenant,initiator}
|
||||
|
||||
Alert guidance
|
||||
- Violation spike: increase(aoc_violation_total[15m]) > 0 for critical sources.
|
||||
- Stale ingestion: no growth in ingestion_write_total for > 60 minutes.
|
||||
- Signature drop: rising ingestion_signature_verified_total{result="fail"}.
|
||||
|
||||
Health snapshot endpoint
|
||||
- GET /obs/excititor/health returns ingest, link, signature, conflict status.
|
||||
- Settings control warning and critical thresholds for lag, coverage, and conflict ratio.
|
||||
|
||||
Trace taxonomy
|
||||
- ingest.fetch, ingest.transform, ingest.write
|
||||
- aoc.guard for violations
|
||||
- verify.run for verification jobs
|
||||
|
||||
Log fields
|
||||
- traceId, tenant, source.vendor, upstream.upstreamId
|
||||
- contentHash, violation.code, verification.window
|
||||
- Correlation headers: X-Stella-TraceId, X-Stella-CorrelationId
|
||||
|
||||
Advisory AI chunk metrics
|
||||
- advisory_ai_chunk_requests_total
|
||||
- advisory_ai_chunk_latency_milliseconds
|
||||
- advisory_ai_chunk_segments
|
||||
- advisory_ai_chunk_sources
|
||||
- advisory_ai_guardrail_blocks_total
|
||||
|
||||
Dashboards
|
||||
- AOC ingestion health: sources overview, violations, signature rate, supersedes depth.
|
||||
- Offline mode dashboard from offline snapshots.
|
||||
|
||||
Offline posture
|
||||
- Metrics exporters write to local Prometheus snapshots in offline kits.
|
||||
- CLI verification reports are hashed and archived.
|
||||
- Dashboards support offline data sources.
|
||||
39
docs2/observability-logging.md
Normal file
39
docs2/observability-logging.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Logging standards
|
||||
|
||||
Goals
|
||||
- Deterministic, structured logs for all services.
|
||||
- Safe for tenant isolation and offline review.
|
||||
|
||||
Required fields
|
||||
- timestamp (UTC ISO-8601)
|
||||
- tenant, workload, env, region, version
|
||||
- level (debug, info, warn, error, fatal)
|
||||
- category and operation
|
||||
- trace_id, span_id, correlation_id when present
|
||||
- message (concise, no secrets)
|
||||
- status (ok, error, fault, throttle)
|
||||
- error.code, error.message (redacted), retryable when status is not ok
|
||||
|
||||
Optional fields
|
||||
- resource, http.method, http.status_code, duration_ms
|
||||
- host, pid, thread
|
||||
|
||||
Offline kit import fields
|
||||
- tenant_id, bundle_type, bundle_digest, bundle_path
|
||||
- manifest_version, manifest_created_at
|
||||
- force_activate, force_activate_reason
|
||||
- result, reason_code, reason_message
|
||||
- quarantine_id, quarantine_path
|
||||
|
||||
Redaction rules
|
||||
- Never log auth headers, tokens, passwords, private keys, or full bodies.
|
||||
- Redact to "[redacted]" and add redaction.reason.
|
||||
- Hash low-cardinality identifiers and mark hashed=true.
|
||||
|
||||
Determinism and offline posture
|
||||
- NDJSON with LF endings; UTC timestamps only.
|
||||
- No external enrichment; rely on bundled metadata.
|
||||
|
||||
Sampling and rate limits
|
||||
- Info logs rate-limited per component; warn and error never sampled.
|
||||
- Audit logs are never sampled and include actor, action, target, result.
|
||||
57
docs2/observability-metrics-slos.md
Normal file
57
docs2/observability-metrics-slos.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Metrics and SLOs
|
||||
|
||||
Core metrics (platform-wide)
|
||||
- http_requests_total{tenant,workload,route,status}
|
||||
- http_request_duration_seconds (histogram)
|
||||
- worker_jobs_total{tenant,queue,status}
|
||||
- worker_job_duration_seconds (histogram)
|
||||
- db_query_duration_seconds{db,operation}
|
||||
- db_pool_in_use, db_pool_available
|
||||
- cache_requests_total{result=hit|miss}
|
||||
- cache_latency_seconds (histogram)
|
||||
- queue_depth{tenant,queue}
|
||||
- errors_total{tenant,workload,code}
|
||||
|
||||
SLO targets (suggested)
|
||||
- API availability: 99.9% monthly per public service.
|
||||
- P95 latency: <300ms reads, <1s writes.
|
||||
- Worker job success: >99% over 30d.
|
||||
- Queue backlog: alert when queue_depth > 1000 for 5 minutes.
|
||||
|
||||
Alert examples
|
||||
- Error rate: rate(errors_total[5m]) / rate(http_requests_total[5m]) > 0.02
|
||||
- Latency regression: p95 http_request_duration_seconds > 0.3s
|
||||
- Queue backlog: queue_depth > 1000 for 5 minutes
|
||||
- Job failures: rate(worker_jobs_total{status="failed"}[10m]) > 0.01
|
||||
|
||||
UX KPIs (triage TTFS)
|
||||
- P95 first evidence <= 1.5s; skeleton <= 0.2s.
|
||||
- Clicks-to-closure median <= 6.
|
||||
- Evidence completeness >= 90% (>= 3.6/4).
|
||||
|
||||
TTFS metrics
|
||||
- ttfs_latency_seconds{surface,cache_hit,signal_source,kind,phase,tenant_id}
|
||||
- ttfs_signal_total{surface,cache_hit,signal_source,kind,phase,tenant_id}
|
||||
- ttfs_cache_hit_total, ttfs_cache_miss_total
|
||||
- ttfs_slo_breach_total{surface,cache_hit,signal_source,kind,phase,tenant_id}
|
||||
- ttfs_error_total{surface,cache_hit,signal_source,kind,phase,tenant_id,error_type,error_code}
|
||||
|
||||
Offline kit metrics
|
||||
- offlinekit_import_total{status,tenant_id}
|
||||
- offlinekit_attestation_verify_latency_seconds{attestation_type,success}
|
||||
- attestor_rekor_success_total{mode}
|
||||
- attestor_rekor_retry_total{reason}
|
||||
- rekor_inclusion_latency{success}
|
||||
|
||||
Scanner FN-Drift metrics
|
||||
- scanner.fn_drift.percent (30-day rolling percentage)
|
||||
- scanner.fn_drift.transitions_30d and scanner.fn_drift.evaluated_30d
|
||||
- scanner.fn_drift.cause.feed_delta, rule_delta, lattice_delta, reachability_delta, engine
|
||||
- scanner.classification_changes_total{cause}
|
||||
- scanner.fn_transitions_total{cause}
|
||||
- SLO targets: warning above 1.0%, critical above 2.5%, engine drift > 0%
|
||||
|
||||
Hygiene
|
||||
- Tag metrics with tenant, workload, env, region, version.
|
||||
- Keep metric names stable and namespace custom metrics per module.
|
||||
- Use deterministic bucket boundaries and consistent units.
|
||||
48
docs2/observability-policy.md
Normal file
48
docs2/observability-policy.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Policy observability
|
||||
|
||||
Purpose
|
||||
- Capture Policy Engine metrics, logs, traces, and incident workflows.
|
||||
|
||||
Metrics
|
||||
- policy_run_seconds{tenant,policy,mode}
|
||||
- policy_run_queue_depth{tenant}
|
||||
- policy_run_failures_total{tenant,policy,reason}
|
||||
- policy_run_retries_total{tenant,policy}
|
||||
- policy_run_inputs_pending_bytes{tenant}
|
||||
- policy_rules_fired_total{tenant,policy,rule}
|
||||
- policy_vex_overrides_total{tenant,policy,vendor,justification}
|
||||
- policy_suppressions_total{tenant,policy,action}
|
||||
- policy_selection_batch_duration_seconds{tenant,policy}
|
||||
- policy_materialization_conflicts_total{tenant,policy}
|
||||
- policy_api_requests_total{endpoint,method,status}
|
||||
- policy_api_latency_seconds{endpoint,method}
|
||||
- policy_api_rate_limited_total{endpoint}
|
||||
- policy_queue_leases_active{tenant}
|
||||
- policy_queue_lease_expirations_total{tenant}
|
||||
- policy_delta_backlog_age_seconds{tenant,source}
|
||||
|
||||
Logs
|
||||
- Structured JSON with policyId, policyVersion, tenant, runId, rule, traceId, env.sealed.
|
||||
- Categories: policy.run, policy.evaluate, policy.materialize, policy.simulate, policy.lifecycle.
|
||||
- Rule-hit logs sample at 1% by default; incident mode raises to 100%.
|
||||
|
||||
Traces
|
||||
- policy.api, policy.select, policy.evaluate, policy.materialize, policy.simulate.
|
||||
- Trace context propagated to CLI and UI.
|
||||
|
||||
Alerts
|
||||
- PolicyRunSlaBreach: p95 policy_run_seconds too high.
|
||||
- PolicyQueueStuck: policy_delta_backlog_age_seconds > 600.
|
||||
- DeterminismMismatch: ERR_POL_004 or replay diff.
|
||||
- SimulationDrift: simulation exit 20 over threshold.
|
||||
- VexOverrideSpike and SuppressionSurge.
|
||||
|
||||
Incident mode
|
||||
- POST /api/policy/incidents/activate toggles sampling to 100%.
|
||||
- Retention extends to 30 days during incident.
|
||||
- policy.incident.activated event emitted.
|
||||
|
||||
Integration points
|
||||
- Authority metrics for scope_denied events.
|
||||
- Concelier and Excititor trace propagation via gRPC metadata.
|
||||
- Offline kits export metrics and logs snapshots.
|
||||
29
docs2/observability-standards.md
Normal file
29
docs2/observability-standards.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Observability standards
|
||||
|
||||
Common envelope fields
|
||||
- Trace context: trace_id, span_id, trace_flags; propagate W3C traceparent and baggage.
|
||||
- Tenant and workload: tenant, workload (service), region, env, version.
|
||||
- Subject: component, operation, resource (purl or uri when safe).
|
||||
- Timing: UTC ISO-8601 timestamp; durations in milliseconds.
|
||||
- Outcome: status (ok, error, fault, throttle), error.code, redacted error.message, retryable.
|
||||
|
||||
Scrubbing policy
|
||||
- Denylist PII and secrets: emails, tokens, auth headers, private keys, passwords.
|
||||
- Redact to "[redacted]" and add redaction.reason (secret, pii, tenant_policy).
|
||||
- Hash low-cardinality identifiers with sha256 and mark hashed=true.
|
||||
- Never log full request or response bodies; store hashes and lengths only.
|
||||
|
||||
Sampling defaults
|
||||
- Traces: 10% non-prod, 5% prod; always sample error or audit spans.
|
||||
- Logs: info logs rate-limited; warn and error never sampled.
|
||||
- Metrics: never sampled; stable histogram buckets per component.
|
||||
|
||||
Redaction override
|
||||
- Overrides require a ticket id and are time-bound.
|
||||
- Config: telemetry.redaction.overrides and telemetry.redaction.override_ttl (default 24h).
|
||||
- Emit telemetry.redaction.audit with actor, fields, and TTL.
|
||||
|
||||
Determinism and offline
|
||||
- No external enrichers; use bundled service maps and tenant metadata only.
|
||||
- Export ordering: timestamp, workload, operation.
|
||||
- Always use UTC; NDJSON for log exports.
|
||||
61
docs2/observability-telemetry-controls.md
Normal file
61
docs2/observability-telemetry-controls.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Telemetry controls and propagation
|
||||
|
||||
Bootstrap wiring
|
||||
- AddStellaOpsTelemetry wires metrics and tracing with deterministic defaults.
|
||||
- Disable exporters when sealed or when egress is not allowed.
|
||||
|
||||
Minimal host wiring (example)
|
||||
```csharp
|
||||
builder.Services.AddStellaOpsTelemetry(
|
||||
builder.Configuration,
|
||||
serviceName: "StellaOps.SampleService",
|
||||
serviceVersion: builder.Configuration["VERSION"],
|
||||
configureOptions: options =>
|
||||
{
|
||||
options.Collector.Enabled = builder.Configuration.GetValue<bool>("Telemetry:Collector:Enabled", true);
|
||||
options.Collector.Endpoint = builder.Configuration["Telemetry:Collector:Endpoint"];
|
||||
options.Collector.Protocol = TelemetryCollectorProtocol.Grpc;
|
||||
},
|
||||
configureMetrics: m => m.AddAspNetCoreInstrumentation(),
|
||||
configureTracing: t => t.AddHttpClientInstrumentation());
|
||||
```
|
||||
|
||||
Propagation rules
|
||||
- HTTP headers: traceparent, tracestate, x-stella-tenant, x-stella-actor, x-stella-imposed-rule.
|
||||
- gRPC metadata: stella-tenant, stella-actor, stella-imposed-rule.
|
||||
- Tenant is required for all requests except sealed diagnostics jobs.
|
||||
|
||||
Metrics helper expectations
|
||||
- Golden signals: http.server.duration, http.client.duration, messaging.operation.duration,
|
||||
job.execution.duration, runtime.gc.pause, db.call.duration.
|
||||
- Mandatory tags: tenant, service, endpoint or operation, result (ok|error|cancelled|throttled), sealed.
|
||||
- Cardinality guard trims tag values to 64 chars and caps distinct values per key.
|
||||
|
||||
Scrubbing configuration
|
||||
- Telemetry:Scrub:Enabled (default true)
|
||||
- Telemetry:Scrub:Sealed (forces scrubbing when sealed)
|
||||
- Telemetry:Scrub:HashSalt (optional)
|
||||
- Telemetry:Scrub:MaxValueLength (default 256)
|
||||
|
||||
Sealed mode behavior
|
||||
- Disable external exporters; use memory or file OTLP.
|
||||
- Tag sealed=true and scrubbed=true on all records.
|
||||
- Sampling capped by Telemetry:Sealed:MaxSamplingPercent.
|
||||
- File exporter rotates deterministically and enforces 0600 permissions.
|
||||
|
||||
Sealed mode config keys
|
||||
- Telemetry:Sealed:Enabled
|
||||
- Telemetry:Sealed:Exporter (memory|file)
|
||||
- Telemetry:Sealed:FilePath
|
||||
- Telemetry:Sealed:MaxBytes
|
||||
- Telemetry:Sealed:MaxSamplingPercent
|
||||
|
||||
Incident mode (CLI)
|
||||
- Flag: --incident-mode
|
||||
- Config: Telemetry:Incident:Enabled and Telemetry:Incident:TTL
|
||||
- State file: ~/.stellaops/incident-mode.json (0600 permissions)
|
||||
- Emits telemetry.incident.activated and telemetry.incident.expired audit events.
|
||||
|
||||
Determinism
|
||||
- UTC timestamps and stable ordering for OTLP exports.
|
||||
- No external enrichment in sealed mode.
|
||||
27
docs2/observability-tracing.md
Normal file
27
docs2/observability-tracing.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Tracing standards
|
||||
|
||||
Goals
|
||||
- Consistent distributed tracing across services, workers, and CLI.
|
||||
- Safe for offline and air-gapped deployments.
|
||||
|
||||
Context propagation
|
||||
- Use W3C traceparent and baggage only.
|
||||
- Preserve incoming trace_id and create child spans per operation.
|
||||
- For async work, attach stored trace context as links rather than a new parent.
|
||||
|
||||
Span conventions
|
||||
- Names use <component>.<operation> (example: policy.evaluate).
|
||||
- Required attributes: tenant, workload, env, region, version, operation, status.
|
||||
- HTTP spans: http.method, http.route, http.status_code, net.peer.name, net.peer.port.
|
||||
- DB spans: db.system, db.name, db.operation, db.statement (no literals).
|
||||
- Message spans: messaging.system, messaging.destination, messaging.operation, messaging.message_id.
|
||||
- Errors: status=error with error.code, redacted error.message, retryable.
|
||||
|
||||
Sampling
|
||||
- Default head sampling: 10% non-prod, 5% prod.
|
||||
- Always sample error or audit spans.
|
||||
- Override via Tracing__SampleRate per service.
|
||||
|
||||
Offline posture
|
||||
- No external exporters; emit OTLP to local collector or file.
|
||||
- UTC timestamps only.
|
||||
45
docs2/observability-ui-telemetry.md
Normal file
45
docs2/observability-ui-telemetry.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Console telemetry
|
||||
|
||||
Purpose
|
||||
- Capture console performance, security signals, and offline behavior.
|
||||
|
||||
Metrics
|
||||
- ui_route_render_seconds{route,tenant,device}
|
||||
- ui_request_duration_seconds{service,method,status,tenant}
|
||||
- ui_filter_apply_total{route,filter,tenant}
|
||||
- ui_tenant_switch_total{fromTenant,toTenant,trigger}
|
||||
- ui_offline_banner_seconds{reason,tenant}
|
||||
- ui_dpop_failure_total{endpoint,reason}
|
||||
- ui_fresh_auth_prompt_total{action,tenant}
|
||||
- ui_fresh_auth_failure_total{action,reason}
|
||||
- ui_download_manifest_refresh_seconds{tenant,channel}
|
||||
- ui_download_export_queue_depth{tenant,artifactType}
|
||||
- ui_download_command_copied_total{tenant,artifactType}
|
||||
- ui_telemetry_batch_failures_total{transport,reason}
|
||||
- ui_telemetry_queue_depth{priority,tenant}
|
||||
|
||||
Logs
|
||||
- Categories: ui.action, ui.tenant.switch, ui.download.commandCopied, ui.security.anomaly, ui.telemetry.failure.
|
||||
- Core fields: timestamp, level, action, route, tenant, subject, correlationId, offlineMode.
|
||||
- PII is scrubbed; user identifiers are hashed.
|
||||
|
||||
Traces
|
||||
- ui.route.transition, ui.api.fetch, ui.sse.stream, ui.telemetry.batch, ui.policy.action.
|
||||
- W3C traceparent propagated through the gateway for cross-service stitching.
|
||||
|
||||
Feature flags and config
|
||||
- CONSOLE_METRICS_ENABLED, CONSOLE_METRICS_VERBOSE, CONSOLE_LOG_LEVEL.
|
||||
- OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS.
|
||||
- CONSOLE_TELEMETRY_SSE_ENABLED to expose /console/telemetry.
|
||||
|
||||
Offline workflow
|
||||
- Metrics scraped locally and stored with offline bundles.
|
||||
- OTLP batches queue locally and expose ui_telemetry_queue_depth.
|
||||
- Retain telemetry bundles for audit; export Grafana JSON with bundles.
|
||||
|
||||
Alerting hints
|
||||
- ConsoleLatencyHigh when ui_route_render_seconds p95 exceeds target.
|
||||
- BackendLatencyHigh when ui_request_duration_seconds spikes.
|
||||
- TenantSwitchFailures when ui_dpop_failure_total increases.
|
||||
- DownloadsBacklog when ui_download_export_queue_depth grows.
|
||||
- TelemetryExportErrors when ui_telemetry_batch_failures_total > 0.
|
||||
22
docs2/observability-vuln-telemetry.md
Normal file
22
docs2/observability-vuln-telemetry.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Vuln explorer telemetry
|
||||
|
||||
Purpose
|
||||
- Define metrics, logs, traces, and dashboards for vulnerability triage.
|
||||
|
||||
Planned metrics (pending final identifiers)
|
||||
- findings_open_total
|
||||
- mttr_seconds
|
||||
- triage_actions_total
|
||||
- report_generation_seconds
|
||||
|
||||
Planned logs
|
||||
- Fields: findingId, artifactId, advisoryId, policyVersion, actor, actionType.
|
||||
- Deterministic JSON with correlation IDs.
|
||||
|
||||
Planned traces
|
||||
- Spans for triage actions and report generation.
|
||||
- Sampling follows global tracing defaults; errors always sampled.
|
||||
|
||||
Assets and hashes
|
||||
- Capture metrics, logs, traces, and dashboard exports with SHA256SUMS.
|
||||
- Store assets under docs/assets/vuln-explorer/ once available.
|
||||
@@ -1,14 +1,23 @@
|
||||
# Observability
|
||||
|
||||
## Telemetry signals
|
||||
- Metrics for scan latency, cache hit rate, policy evaluation time, queue depth.
|
||||
- Logs are structured and include correlation IDs.
|
||||
- Traces connect Scanner, Policy, Scheduler, and Notify workflows.
|
||||
Overview
|
||||
- Deterministic metrics, logs, and traces with tenant isolation.
|
||||
- Offline-friendly exports for audits and air-gap review.
|
||||
|
||||
## Audit trails
|
||||
- Signing and policy actions are recorded for compliance.
|
||||
- Tenant and actor metadata is included in audit records.
|
||||
Core references
|
||||
- observability-standards.md
|
||||
- observability-logging.md
|
||||
- observability-tracing.md
|
||||
- observability-metrics-slos.md
|
||||
- observability-telemetry-controls.md
|
||||
|
||||
## Telemetry stack
|
||||
- Telemetry module provides collectors, dashboards, and alert rules.
|
||||
- Offline bundles include telemetry assets for air-gapped installs.
|
||||
Service and workflow observability
|
||||
- observability-aoc.md
|
||||
- observability-aggregation.md
|
||||
- observability-policy.md
|
||||
- observability-ui-telemetry.md
|
||||
- observability-vuln-telemetry.md
|
||||
|
||||
Audit alignment
|
||||
- security/forensics-and-evidence-locker.md
|
||||
- security/timeline.md
|
||||
|
||||
@@ -6,6 +6,30 @@ Core runbooks
|
||||
- Quarantine: isolate bundles with hash or signature mismatches.
|
||||
- Sealed startup diagnostics: confirm egress block and time anchor validity.
|
||||
|
||||
Offline kit management
|
||||
- Generate full or delta kits in connected environments.
|
||||
- Verify kit hash and signature before transfer.
|
||||
- Import and install kit, then confirm component freshness.
|
||||
|
||||
Feed updates
|
||||
- Use delta kits for smaller updates.
|
||||
- Roll back to previous snapshot when feeds introduce regressions.
|
||||
- Track feed age and kit expiry thresholds.
|
||||
|
||||
Scanning in air-gap mode
|
||||
- Scan local images or SBOMs without registry pull.
|
||||
- Generate SBOMs locally and scan from file.
|
||||
- Force offline feeds when required by policy.
|
||||
|
||||
Verification in air-gap mode
|
||||
- Verify proof bundles offline with local trust roots.
|
||||
- Export and import trust bundles for signer and CA rotation.
|
||||
- Run score replay with frozen timestamps if needed.
|
||||
|
||||
Health checks
|
||||
- Monitor kit age, feed freshness, trust store validity, disk usage.
|
||||
- Use deterministic health checks and keep results for audit.
|
||||
|
||||
Import and verify
|
||||
- Validate bundle hash, manifest entries, and schema checks.
|
||||
- Record import receipt with operator, time anchor, and manifest hash.
|
||||
|
||||
49
docs2/operations/key-rotation.md
Normal file
49
docs2/operations/key-rotation.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Key rotation
|
||||
|
||||
Purpose
|
||||
- Rotate signing keys without invalidating historical DSSE proofs.
|
||||
|
||||
Principles
|
||||
- Do not mutate old DSSE envelopes.
|
||||
- Keep key history; revoke instead of delete.
|
||||
- Publish key material to trust anchors and mirrors.
|
||||
- Audit all key lifecycle events.
|
||||
|
||||
Key profiles (examples)
|
||||
- default: SHA256-ED25519
|
||||
- fips: SHA256-ECDSA-P256
|
||||
- gost: GOST-R-34.10-2012
|
||||
- sm2: SM2-P256
|
||||
- pqc: ML-DSA-65
|
||||
|
||||
Rotation workflow
|
||||
1. Generate a new key in the configured keystore.
|
||||
2. Add the key to the trust anchor without removing old keys.
|
||||
3. Run a transition period where both keys verify.
|
||||
4. Revoke the old key with an effective date.
|
||||
5. Publish updated key material to attestation feeds or mirrors.
|
||||
|
||||
Trust anchors
|
||||
- Scoped by PURL pattern and allowed predicate types.
|
||||
- Store allowedKeyIds, revokedKeys, and keyHistory with timestamps.
|
||||
|
||||
Verification with key history
|
||||
- Verify signatures using the key valid at the time of signing.
|
||||
- Revoked keys remain valid for pre-revocation attestations.
|
||||
|
||||
Emergency revocation
|
||||
- Revoke compromised keys immediately and publish updated anchors.
|
||||
- Re-issue trust bundles and notify downstream verifiers.
|
||||
|
||||
Metrics and alerts
|
||||
- signer_key_age_days
|
||||
- signer_keys_active_total
|
||||
- signer_keys_revoked_total
|
||||
- signer_rotation_events_total
|
||||
- signer_verification_key_lookups_total
|
||||
- Alerts when keys near or exceed maximum age.
|
||||
|
||||
Related references
|
||||
- security/crypto-and-trust.md
|
||||
- provenance/attestation-workflow.md
|
||||
- docs/operations/key-rotation-runbook.md
|
||||
37
docs2/operations/proof-verification.md
Normal file
37
docs2/operations/proof-verification.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Proof verification
|
||||
|
||||
Purpose
|
||||
- Verify DSSE bundles and transparency proofs for scan and score evidence.
|
||||
|
||||
Components
|
||||
- DSSE envelope and signature bundle.
|
||||
- Certificate chain and trust roots.
|
||||
- Rekor inclusion proof and checkpoint when online.
|
||||
|
||||
Basic verification
|
||||
- Verify DSSE signature against trusted roots.
|
||||
- Confirm subject digest matches expected artifact.
|
||||
- Validate Merkle inclusion proof when available.
|
||||
|
||||
Offline verification
|
||||
- Use embedded proofs and local trust bundles.
|
||||
- Skip online Rekor queries in sealed mode.
|
||||
- Record verification results in timeline events.
|
||||
|
||||
Transparency log integration
|
||||
- Check Rekor entry status and inclusion proof.
|
||||
- When Rekor is unavailable, rely on cached checkpoint and proofs.
|
||||
|
||||
Troubleshooting cues
|
||||
- DSSE signature invalid: check key rotation or trust anchors.
|
||||
- Merkle root mismatch: verify checkpoint and bundle integrity.
|
||||
- Certificate chain failure: refresh trust roots.
|
||||
|
||||
Monitoring
|
||||
- Track verification latency and failure counts.
|
||||
- Alert on certificate expiry or rising verification failures.
|
||||
|
||||
Related references
|
||||
- provenance/attestation-workflow.md
|
||||
- release/promotion-attestations.md
|
||||
- docs/operations/proof-verification-runbook.md
|
||||
36
docs2/operations/reachability.md
Normal file
36
docs2/operations/reachability.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Reachability operations
|
||||
|
||||
Purpose
|
||||
- Operate call graph ingestion, reachability computation, and explain queries.
|
||||
|
||||
Reachability statuses
|
||||
- unreachable, possibly_reachable, reachable_static, reachable_proven, unknown.
|
||||
|
||||
Call graph operations
|
||||
- Upload call graphs and validate schema.
|
||||
- Inspect entrypoints and merge graphs when required.
|
||||
- Enforce size limits and deterministic ordering.
|
||||
|
||||
Computation
|
||||
- Trigger reachability computation per scan or batch.
|
||||
- Monitor jobs for timeouts and memory caps.
|
||||
- Persist results with graph_cache_epoch for replay.
|
||||
|
||||
Explain queries
|
||||
- Explain a single finding or batch.
|
||||
- Provide alternate paths and reasons for unreachable results.
|
||||
|
||||
Drift handling
|
||||
- Track changes due to graph updates or reachability algorithm changes.
|
||||
- Use drift reports to compare runs and highlight path changes.
|
||||
|
||||
Monitoring
|
||||
- Track computation latency, queue depth, and explain request rates.
|
||||
- Alert on repeated timeouts or inconsistent results.
|
||||
|
||||
Related references
|
||||
- architecture/reachability-lattice.md
|
||||
- architecture/reachability-evidence.md
|
||||
- operations/score-proofs.md
|
||||
- docs/operations/reachability-runbook.md
|
||||
- docs/operations/reachability-drift-guide.md
|
||||
@@ -12,6 +12,12 @@ Runbook set (current)
|
||||
- docs/runbooks/replay_ops.md
|
||||
- docs/runbooks/vex-ops.md
|
||||
- docs/runbooks/vuln-ops.md
|
||||
- operations/score-proofs.md
|
||||
- operations/proof-verification.md
|
||||
- operations/reachability.md
|
||||
- operations/trust-lattice.md
|
||||
- operations/unknowns-queue.md
|
||||
- operations/key-rotation.md
|
||||
|
||||
Common expectations
|
||||
- Hash and store any inbound artifacts with SHA256SUMS.
|
||||
|
||||
46
docs2/operations/score-proofs.md
Normal file
46
docs2/operations/score-proofs.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Score proofs and replay
|
||||
|
||||
Purpose
|
||||
- Provide deterministic score proofs with replayable inputs and attestations.
|
||||
|
||||
When to replay
|
||||
- Determinism audits and compliance checks.
|
||||
- Dispute resolution or vendor verification.
|
||||
- Regression investigation after feed or policy changes.
|
||||
|
||||
Replay operations
|
||||
- Trigger replay via CLI or API with scan or job id.
|
||||
- Support batch replay with concurrency limits.
|
||||
- Nightly replay jobs validate determinism at scale.
|
||||
|
||||
Verification
|
||||
- Online verification uses DSSE and Rekor proofs.
|
||||
- Offline verification uses embedded proofs and local trust bundles.
|
||||
- Verification checks include bundle hash, signature, and input digests.
|
||||
|
||||
Bundle contents
|
||||
- Manifest with inputs and hashes.
|
||||
- SBOM, advisories, VEX snapshots.
|
||||
- Deterministic scoring outputs and explain traces.
|
||||
- DSSE bundle and transparency proof.
|
||||
|
||||
Retention and export
|
||||
- Retain bundles per policy; export for audit with manifests.
|
||||
- Store in Evidence Locker and Offline Kits.
|
||||
|
||||
Monitoring metrics
|
||||
- score_replay_duration_seconds
|
||||
- proof_verification_success_rate
|
||||
- proof_bundle_size_bytes
|
||||
- replay_queue_depth
|
||||
- proof_generation_failures
|
||||
|
||||
Alerting cues
|
||||
- Replay latency p95 > 30s.
|
||||
- Verification failures or queue backlog spikes.
|
||||
|
||||
Related references
|
||||
- operations/proof-verification.md
|
||||
- operations/replay-and-determinism.md
|
||||
- docs/operations/score-proofs-runbook.md
|
||||
- docs/operations/score-replay-runbook.md
|
||||
33
docs2/operations/trust-lattice.md
Normal file
33
docs2/operations/trust-lattice.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Trust lattice operations
|
||||
|
||||
Purpose
|
||||
- Monitor and operate trust lattice gates for VEX and policy decisions.
|
||||
|
||||
Core components
|
||||
- Trust vectors and gate configuration.
|
||||
- Verdict replay for deterministic validation.
|
||||
|
||||
Monitoring
|
||||
- Track gate failure rate, verdict replay failures, and trust vector drift.
|
||||
- Use dashboards for gate health and override usage.
|
||||
|
||||
Common operations
|
||||
- View current trust vectors and gate configuration.
|
||||
- Inspect a verdict and its trust inputs.
|
||||
- Trigger manual calibration when required.
|
||||
|
||||
Emergency procedures
|
||||
- High gate failure rate: pause dependent workflows and investigate sources.
|
||||
- Verdict replay failures: verify inputs, cache epochs, and policy versions.
|
||||
- Trust vector drift: run replay with frozen inputs and compare hashes.
|
||||
|
||||
Maintenance
|
||||
- Daily checks: gate failure rate and queue depth.
|
||||
- Weekly checks: trust vector calibration and drift review.
|
||||
- Monthly checks: update trust bundles and audit logs.
|
||||
|
||||
Related references
|
||||
- architecture/reachability-vex.md
|
||||
- vex/consensus.md
|
||||
- docs/operations/trust-lattice-runbook.md
|
||||
- docs/operations/trust-lattice-troubleshooting.md
|
||||
32
docs2/operations/unknowns-queue.md
Normal file
32
docs2/operations/unknowns-queue.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Unknowns queue operations
|
||||
|
||||
Purpose
|
||||
- Manage unknown components with deterministic triage and SLA tracking.
|
||||
|
||||
Queue model
|
||||
- Bands: HOT, WARM, COLD based on score and SLA.
|
||||
- Reasons include reachability gaps, provenance gaps, VEX conflicts, and ingestion gaps.
|
||||
|
||||
Core workflows
|
||||
- List and triage unknowns by band and reason.
|
||||
- Escalate or resolve with documented justification.
|
||||
- Suppress with expiry and audit trail when approved.
|
||||
|
||||
Budgets and SLAs
|
||||
- Per-environment budgets cap unknowns by reason.
|
||||
- SLA timers trigger alerts when breached.
|
||||
|
||||
Monitoring
|
||||
- unknowns_total, unknowns_hot_count, unknowns_sla_breached
|
||||
- unknowns_escalation_failures, unknowns_avg_age_hours
|
||||
- KEV-specific unknown counts and age
|
||||
|
||||
Alerting cues
|
||||
- HOT band spikes or SLA breaches.
|
||||
- KEV unknowns older than 24 hours.
|
||||
- Rising queue growth rate.
|
||||
|
||||
Related references
|
||||
- signals/unknowns.md
|
||||
- signals/unknowns-ranking.md
|
||||
- docs/operations/unknowns-queue-runbook.md
|
||||
@@ -39,3 +39,4 @@ Related references
|
||||
- orchestrator/cli.md
|
||||
- orchestrator/console.md
|
||||
- orchestrator/run-ledger.md
|
||||
- orchestrator/runbook.md
|
||||
|
||||
36
docs2/orchestrator/runbook.md
Normal file
36
docs2/orchestrator/runbook.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Orchestrator runbook
|
||||
|
||||
Pre-flight
|
||||
- Verify database and queue backends are healthy.
|
||||
- Confirm tenant allowlist and orchestrator scopes in Authority.
|
||||
- Ensure plugin bundles are present and signatures verified.
|
||||
|
||||
Common operations
|
||||
- Start a run via API or CLI.
|
||||
- Cancel runs with idempotent requests.
|
||||
- Stream status via WebSocket or CLI.
|
||||
- Export run ledger as NDJSON for audit.
|
||||
|
||||
Incident response
|
||||
- Queue backlog: scale workers and drain oldest first.
|
||||
- Repeated failures: inspect error codes and inputsHash; roll back DAG version.
|
||||
- Plugin auth errors: rotate secrets and warm caches.
|
||||
|
||||
Health checks
|
||||
- /admin/health for liveness and queue depth.
|
||||
- Metrics: orchestrator_runs_total, orchestrator_queue_depth,
|
||||
orchestrator_step_retries_total, orchestrator_run_duration_seconds.
|
||||
- Logs include tenant, dagId, runId, status with redaction.
|
||||
|
||||
Determinism and immutability
|
||||
- Runs are append-only; never mutate ledger entries.
|
||||
- Use runToken for idempotent retries.
|
||||
|
||||
Offline posture
|
||||
- Keep DAG specs and plugins in sealed storage.
|
||||
- Export logs, metrics, and traces as NDJSON.
|
||||
|
||||
Related references
|
||||
- orchestrator/overview.md
|
||||
- orchestrator/architecture.md
|
||||
- docs/operations/orchestrator-runbook.md
|
||||
46
docs2/provenance/attestation-workflow.md
Normal file
46
docs2/provenance/attestation-workflow.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Attestation workflow
|
||||
|
||||
Purpose
|
||||
- Ensure all exported evidence includes DSSE signatures and transparency proofs.
|
||||
- Provide deterministic verification for online and air-gapped environments.
|
||||
|
||||
Workflow overview
|
||||
- Producer emits a payload and requests signing.
|
||||
- Signer validates policy and signs with tenant or keyless credentials.
|
||||
- Attestor wraps the payload in DSSE, records transparency data, and publishes bundles.
|
||||
- Export Center and Evidence Locker embed bundles in export artifacts.
|
||||
- Verifiers (CLI, services, auditors) validate signatures and proofs.
|
||||
|
||||
Payload types
|
||||
- StellaOps.BuildProvenance@1
|
||||
- StellaOps.SBOMAttestation@1
|
||||
- StellaOps.ScanResults@1
|
||||
- StellaOps.PolicyEvaluation@1
|
||||
- StellaOps.VEXAttestation@1
|
||||
- StellaOps.RiskProfileEvidence@1
|
||||
- StellaOps.PromotionAttestation@1
|
||||
|
||||
Signing and storage controls
|
||||
- Default is short-lived keyless signing; tenant KMS keys are supported.
|
||||
- Ed25519 and ECDSA P-256 are supported.
|
||||
- Payloads must exclude PII and secrets; redaction is required before signing.
|
||||
- Evidence Locker stores immutable copies with retention and legal hold.
|
||||
|
||||
Verification steps
|
||||
- Verify DSSE signature against trusted roots.
|
||||
- Confirm subject digest matches expected artifact.
|
||||
- Verify transparency proof when available.
|
||||
- Enforce freshness using attestation.max_age_days policy.
|
||||
- Record verification results in timeline events.
|
||||
|
||||
Offline posture
|
||||
- Bundles include DSSE, transparency proofs, and certificate chains.
|
||||
- Offline verification uses embedded proofs and cached trust roots.
|
||||
- Pending transparency entries are replayed when connectivity returns.
|
||||
|
||||
Related references
|
||||
- provenance/inline-provenance.md
|
||||
- security/forensics-and-evidence-locker.md
|
||||
- docs/modules/attestor/architecture.md
|
||||
- docs/modules/signer/architecture.md
|
||||
- docs/modules/export-center/architecture.md
|
||||
24
docs2/provenance/backfill.md
Normal file
24
docs2/provenance/backfill.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Provenance backfill
|
||||
|
||||
Purpose
|
||||
- Backfill missing provenance records with deterministic ordering.
|
||||
|
||||
Inputs
|
||||
- Attestation inventory (NDJSON) with subject and digest data.
|
||||
- Subject to Rekor map for resolving transparency entries.
|
||||
|
||||
Procedure
|
||||
1. Validate inventory records (UUID or ULID and digest formats).
|
||||
2. Resolve each subject to a Rekor entry; record gaps and skip if missing.
|
||||
3. Emit backfilled provenance events using a backfill mode that preserves ordering.
|
||||
4. Log every backfilled subject and Rekor digest pair as NDJSON.
|
||||
5. Repeat until gaps are zero and record completion in audit logs.
|
||||
|
||||
Determinism
|
||||
- Sort by subject then Rekor entry before processing.
|
||||
- Use canonical JSON writers and UTC timestamps.
|
||||
|
||||
Related references
|
||||
- provenance/inline-provenance.md
|
||||
- provenance/attestation-workflow.md
|
||||
- docs/provenance/prov-backfill-plan.md
|
||||
34
docs2/provenance/rekor-policy.md
Normal file
34
docs2/provenance/rekor-policy.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Rekor submission policy
|
||||
|
||||
Purpose
|
||||
- Balance transparency log usage with budget limits and offline safety.
|
||||
|
||||
Submission tiers
|
||||
- Tier 1: graph-level attestations per scan (default).
|
||||
- Tier 2: edge bundle attestations for escalations.
|
||||
|
||||
Budgets
|
||||
- Hourly limits for graph submissions.
|
||||
- Daily limits for edge bundle submissions.
|
||||
- Burst windows for Tier 1 only.
|
||||
|
||||
Enforcement
|
||||
- Queue excess submissions with backpressure.
|
||||
- Retry failed submissions with backoff.
|
||||
- Store overflow locally for later submission.
|
||||
|
||||
Offline behavior
|
||||
- Queue submissions in attestor.rekor_offline_queue.
|
||||
- Bundle pending submissions in offline kits.
|
||||
- Drain queue when connectivity returns.
|
||||
|
||||
Monitoring
|
||||
- attestor_rekor_submissions_total
|
||||
- attestor_rekor_submission_latency_seconds
|
||||
- attestor_rekor_queue_depth
|
||||
- attestor_rekor_budget_remaining
|
||||
|
||||
Related references
|
||||
- provenance/attestation-workflow.md
|
||||
- security/crypto-and-trust.md
|
||||
- docs/operations/rekor-policy.md
|
||||
41
docs2/release/promotion-attestations.md
Normal file
41
docs2/release/promotion-attestations.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Promotion attestations
|
||||
|
||||
Purpose
|
||||
- Capture promotion-time evidence in a DSSE predicate for offline audit.
|
||||
|
||||
Predicate: stella.ops/promotion@v1
|
||||
- subject: image name and digest.
|
||||
- materials: SBOM and VEX digests with format and OCI uri.
|
||||
- promotion: from, to, actor, timestamp, pipeline, ticket, notes.
|
||||
- rekor: uuid, logIndex, inclusionProof, checkpoint.
|
||||
- attestation: bundle_sha256 and optional witness.
|
||||
|
||||
Producer workflow
|
||||
1. Resolve and freeze image digest.
|
||||
2. Hash SBOM and VEX artifacts and publish to OCI if needed.
|
||||
3. Obtain Rekor inclusion proof and checkpoint.
|
||||
4. Build promotion predicate JSON.
|
||||
5. Sign with Signer to produce DSSE bundle.
|
||||
6. Store bundle in Evidence Locker and Export Center.
|
||||
|
||||
Verification flow
|
||||
- Verify DSSE signature using trusted roots.
|
||||
- Verify Merkle inclusion using the embedded proof and checkpoint.
|
||||
- Hash SBOM and VEX artifacts and compare to materials digests.
|
||||
- Confirm promotion metadata and ticket evidence.
|
||||
|
||||
Storage and APIs
|
||||
- Signer: /api/v1/signer/sign/dsse
|
||||
- Attestor: /api/v1/rekor/entries
|
||||
- Export Center: serves promotion bundles for offline kits
|
||||
- Evidence Locker: long-term retention of DSSE and proofs
|
||||
|
||||
Security considerations
|
||||
- Promotion metadata is tenant scoped.
|
||||
- Rekor proofs must be embedded for air-gap verification.
|
||||
- Key rotation follows Signer and Authority policies.
|
||||
|
||||
Related references
|
||||
- release/release-engineering.md
|
||||
- provenance/attestation-workflow.md
|
||||
- security/forensics-and-evidence-locker.md
|
||||
@@ -23,6 +23,7 @@ Artifact signing
|
||||
- Cosign for containers and bundles
|
||||
- DSSE envelopes for attestations
|
||||
- Optional Rekor anchoring when available
|
||||
- Promotion attestations capture release evidence for offline audit
|
||||
|
||||
Offline update kit (OUK)
|
||||
- Monthly bundle of feeds and tooling
|
||||
@@ -41,3 +42,5 @@ Related references
|
||||
- docs/ci/*
|
||||
- docs/devops/*
|
||||
- docs/release/* and docs/releases/*
|
||||
- release/promotion-attestations.md
|
||||
- release/release-notes.md
|
||||
|
||||
22
docs2/release/release-notes.md
Normal file
22
docs2/release/release-notes.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Release notes and templates
|
||||
|
||||
Release notes
|
||||
- Historical release notes live under docs/releases/.
|
||||
- Use release notes for time-specific changes; refer to docs2 for current behavior.
|
||||
|
||||
Determinism snippet template
|
||||
- Use a deterministic score summary in release notes when publishing scans.
|
||||
|
||||
Template
|
||||
```
|
||||
- Determinism score: {{overall_score}} (threshold {{overall_min}})
|
||||
- {{image_digest}} score {{score}} ({{identical}}/{{runs}} identical)
|
||||
- Inputs: policy {{policy_sha}}, feeds {{feeds_sha}}, scanner {{scanner_sha}}, platform {{platform}}
|
||||
- Evidence: determinism.json and artifact hashes (DSSE signed, offline ready)
|
||||
- Actions: rerun stella detscore run --bundle determinism.json if score < threshold
|
||||
```
|
||||
|
||||
Related references
|
||||
- release/release-engineering.md
|
||||
- operations/replay-and-determinism.md
|
||||
- docs/release/templates/determinism-score.md
|
||||
36
docs2/risk/api.md
Normal file
36
docs2/risk/api.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Risk API
|
||||
|
||||
Purpose
|
||||
- Expose risk jobs, profiles, simulations, explainability, and exports.
|
||||
|
||||
Endpoints (v1)
|
||||
- POST /api/v1/risk/jobs: submit scoring job.
|
||||
- GET /api/v1/risk/jobs/{job_id}: job status and results.
|
||||
- GET /api/v1/risk/explain/{job_id}: explainability payload.
|
||||
- GET /api/v1/risk/profiles: list profiles with hashes and versions.
|
||||
- POST /api/v1/risk/profiles: create or update profiles with DSSE metadata.
|
||||
- POST /api/v1/risk/simulations: dry-run scoring with fixtures.
|
||||
- GET /api/v1/risk/export/{job_id}: export bundle for audit.
|
||||
|
||||
Auth and tenancy
|
||||
- Headers: X-Stella-Tenant, Authorization Bearer token.
|
||||
- Optional X-Stella-Scope for imposed rule reminders.
|
||||
|
||||
Error model
|
||||
- Envelope: code, message, correlation_id, severity, remediation.
|
||||
- Rate-limit headers: Retry-After, X-RateLimit-Remaining.
|
||||
- ETag headers for profile and explain responses.
|
||||
|
||||
Feature flags
|
||||
- risk.jobs, risk.explain, risk.simulations, risk.export.
|
||||
|
||||
Determinism and offline
|
||||
- Samples in docs/risk/samples/api/ with SHA256SUMS.
|
||||
- Stable field ordering and UTC timestamps.
|
||||
|
||||
Related references
|
||||
- risk/overview.md
|
||||
- risk/profiles.md
|
||||
- risk/factors.md
|
||||
- risk/formulas.md
|
||||
- risk/explainability.md
|
||||
28
docs2/risk/explainability.md
Normal file
28
docs2/risk/explainability.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Risk explainability
|
||||
|
||||
Purpose
|
||||
- Provide per-factor contributions with provenance and gating rationale.
|
||||
|
||||
Explainability envelope
|
||||
- job_id, tenant_id, context_id
|
||||
- profile_id, profile_version, profile_hash
|
||||
- finding_id, raw_score, normalized_score, severity
|
||||
- signal_values and signal_contributions
|
||||
- override_applied, override_reason, gates_triggered
|
||||
- scored_at and provenance hashes
|
||||
|
||||
UI and CLI expectations
|
||||
- Deterministic ordering by factor type, source, then timestamp.
|
||||
- Highlight top contributors and gates.
|
||||
- Export Center bundles include explain payload and manifest hashes.
|
||||
|
||||
Determinism and offline
|
||||
- Fixtures under docs/risk/samples/explain/ with SHA256SUMS.
|
||||
- No live calls in examples or captures.
|
||||
|
||||
Related references
|
||||
- risk/overview.md
|
||||
- risk/factors.md
|
||||
- risk/formulas.md
|
||||
- risk/profiles.md
|
||||
- risk/api.md
|
||||
29
docs2/risk/factors.md
Normal file
29
docs2/risk/factors.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Risk factors
|
||||
|
||||
Purpose
|
||||
- Define factor catalog and normalization rules for risk scoring.
|
||||
|
||||
Factor catalog (examples)
|
||||
- CVSS or exploit likelihood: numeric 0-10 normalized to 0-1.
|
||||
- KEV flag: boolean boost with provenance.
|
||||
- Reachability: numeric with entrypoint and path provenance.
|
||||
- Runtime facts: categorical or numeric with trace references.
|
||||
- Fix availability: vendor status and mitigation context.
|
||||
- Asset criticality: tenant or service criticality signals.
|
||||
- Provenance trust: categorical trust tier with attestation hash.
|
||||
- Custom overrides: scoped, expiring, and auditable.
|
||||
|
||||
Normalization rules
|
||||
- Validate against profile signal types and transforms.
|
||||
- Clamp numeric inputs to 0-1 and record original values in provenance.
|
||||
- Apply TTL or decay deterministically; drop expired signals.
|
||||
- Precedence: signed over unsigned, runtime over static, newer over older.
|
||||
|
||||
Determinism and ordering
|
||||
- Sort factors by factor type, source, then timestamp.
|
||||
- Hash fixtures and record SHA256 in docs/risk/samples/factors/.
|
||||
|
||||
Related references
|
||||
- risk/overview.md
|
||||
- risk/formulas.md
|
||||
- risk/profiles.md
|
||||
28
docs2/risk/formulas.md
Normal file
28
docs2/risk/formulas.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Risk formulas
|
||||
|
||||
Purpose
|
||||
- Define how normalized factors combine into a risk score and severity.
|
||||
|
||||
Formula building blocks
|
||||
- Weighted sum with per-factor caps and family caps.
|
||||
- Normalize raw score to 0-1 and apply gates.
|
||||
- VEX gate: not_affected can short-circuit to 0.0.
|
||||
- CVSS + KEV boost: clamp01((cvss/10) + kev_bonus).
|
||||
- Trust gates: fail or down-weight low-trust provenance.
|
||||
- Decay: apply time-based decay to stale signals.
|
||||
- Overrides: tenant or asset overrides with expiry and audit.
|
||||
|
||||
Severity mapping
|
||||
- Map normalized_score to critical, high, medium, low, informational.
|
||||
- Store band rationale in explainability output.
|
||||
|
||||
Determinism
|
||||
- Stable factor ordering before aggregation.
|
||||
- Fixed precision (example: 4 decimals) before severity mapping.
|
||||
- Hash fixtures and record SHA256 in docs/risk/samples/formulas/.
|
||||
|
||||
Related references
|
||||
- risk/overview.md
|
||||
- risk/factors.md
|
||||
- risk/profiles.md
|
||||
- risk/explainability.md
|
||||
36
docs2/risk/overview.md
Normal file
36
docs2/risk/overview.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Risk overview
|
||||
|
||||
Purpose
|
||||
- Explain risk scoring concepts, lifecycle, and artifacts.
|
||||
- Preserve deterministic, provenance-backed outputs.
|
||||
|
||||
Core concepts
|
||||
- Signals become evidence after validation and normalization.
|
||||
- Profiles define weights, thresholds, overrides, and severity mapping.
|
||||
- Formulas aggregate normalized factors into a 0-1 score.
|
||||
- Provenance carries source hashes and attestation references.
|
||||
|
||||
Lifecycle
|
||||
1. Submit a risk job with tenant, context, profile, and findings.
|
||||
2. Ingest evidence from scanners, reachability, VEX, runtime signals, and KEV.
|
||||
3. Normalize and dedupe by provenance hash.
|
||||
4. Evaluate profile rules, gates, and overrides.
|
||||
5. Assign severity band and emit explainability output.
|
||||
6. Export bundles with profile hash and evidence references.
|
||||
|
||||
Artifacts
|
||||
- Profile schema: id, version, signals, weights, overrides, metadata, provenance.
|
||||
- Job and result fields: job_id, profile_hash, normalized_score, severity.
|
||||
- Explainability envelope: signal_values, signal_contributions, gates_triggered.
|
||||
|
||||
Determinism and offline posture
|
||||
- Stable ordering for factors and contributions.
|
||||
- Fixed precision math with UTC timestamps only.
|
||||
- Fixtures and hashes live under docs/risk/samples/.
|
||||
|
||||
Related references
|
||||
- risk/factors.md
|
||||
- risk/formulas.md
|
||||
- risk/profiles.md
|
||||
- risk/explainability.md
|
||||
- risk/api.md
|
||||
37
docs2/risk/profiles.md
Normal file
37
docs2/risk/profiles.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Risk profiles
|
||||
|
||||
Purpose
|
||||
- Define profile schema, lifecycle, and governance for risk scoring.
|
||||
|
||||
Schema essentials
|
||||
- id, version, description, signals[], weights, metadata.
|
||||
- signals[] fields: name, source, type (numeric, boolean, categorical), path, transform, unit.
|
||||
- overrides: severity rules and decision rules.
|
||||
- Optional: extends, rollout flags, valid_from, valid_until.
|
||||
|
||||
Severity levels
|
||||
- critical, high, medium, low, informational.
|
||||
|
||||
Lifecycle
|
||||
1. Author profiles in Policy Studio.
|
||||
2. Simulate against deterministic fixtures.
|
||||
3. Review and approve with DSSE signatures.
|
||||
4. Promote and activate in Policy Engine.
|
||||
5. Roll back by activating a previous version.
|
||||
|
||||
Governance and determinism
|
||||
- Profiles are immutable after promotion.
|
||||
- Each version carries a profile_hash and signed manifest entry.
|
||||
- Simulation and production share the same evaluation codepath.
|
||||
- Offline bundles include profiles and fixtures with hashes.
|
||||
|
||||
Explainability and observability
|
||||
- Emit per-factor contributions with stable ordering.
|
||||
- Track evaluation latency, factor coverage, profile hit rate, and override usage.
|
||||
|
||||
Related references
|
||||
- risk/overview.md
|
||||
- risk/factors.md
|
||||
- risk/formulas.md
|
||||
- risk/explainability.md
|
||||
- risk/api.md
|
||||
@@ -32,3 +32,6 @@ Related references
|
||||
- docs/security/crypto-simulation-services.md
|
||||
- docs/security/crypto-compliance.md
|
||||
- docs/airgap/staleness-and-time.md
|
||||
- operations/key-rotation.md
|
||||
- provenance/rekor-policy.md
|
||||
- release/promotion-attestations.md
|
||||
|
||||
30
docs2/security/evidence-locker-publishing.md
Normal file
30
docs2/security/evidence-locker-publishing.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Evidence locker publishing
|
||||
|
||||
Purpose
|
||||
- Publish deterministic evidence bundles to the Evidence Locker.
|
||||
|
||||
Required inputs
|
||||
- Evidence locker base URL (no trailing slash).
|
||||
- Bearer token with write scopes for required prefixes.
|
||||
- Signing key for final bundle signing (Cosign key or key file).
|
||||
|
||||
Publishing flow
|
||||
- Build deterministic tar bundles for each producer (signals, runtime, evidence packs).
|
||||
- Verify bundle hashes and inner SHA256 lists before upload.
|
||||
- Upload bundles to the Evidence Locker using the configured token.
|
||||
- Re-sign bundles with production keys when required.
|
||||
|
||||
Deterministic packaging rules
|
||||
- tar --sort=name
|
||||
- fixed mtime (UTC 1970-01-01)
|
||||
- owner and group set to 0
|
||||
- numeric-owner enabled
|
||||
|
||||
Offline posture
|
||||
- Transparency log upload may be disabled in sealed mode.
|
||||
- Trust derives from local keys and recorded hashes.
|
||||
- Upload scripts must fail on hash mismatch.
|
||||
|
||||
Related references
|
||||
- security/forensics-and-evidence-locker.md
|
||||
- provenance/attestation-workflow.md
|
||||
@@ -28,7 +28,8 @@ Minimum bundle layout
|
||||
- signatures/ for DSSE or sigstore bundles
|
||||
|
||||
Related references
|
||||
- provenance/attestation-workflow.md
|
||||
- security/timeline.md
|
||||
- security/evidence-locker-publishing.md
|
||||
- docs/forensics/evidence-locker.md
|
||||
- docs/forensics/provenance-attestation.md
|
||||
- docs/forensics/timeline.md
|
||||
- docs/evidence-locker/evidence-pack-schema.md
|
||||
|
||||
27
docs2/security/multi-tenancy.md
Normal file
27
docs2/security/multi-tenancy.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Multi-tenancy
|
||||
|
||||
Purpose
|
||||
- Ensure strict tenant isolation across APIs, storage, and observability.
|
||||
|
||||
Tenant lifecycle
|
||||
- Create tenants with scoped roles and default policies.
|
||||
- Suspend or retire tenants with audit records.
|
||||
- Migrations and data retention follow governance policy.
|
||||
|
||||
Isolation model
|
||||
- Tokens carry tenant identifiers and scopes.
|
||||
- APIs require tenant headers; cross-tenant actions are explicit.
|
||||
- Datastores enforce tenant_id and RLS where supported.
|
||||
|
||||
Observability
|
||||
- Metrics, logs, and traces always include tenant.
|
||||
- Cross-tenant access attempts emit audit events.
|
||||
|
||||
Offline posture
|
||||
- Offline bundles are tenant scoped.
|
||||
- Tenant list in offline mode is limited to snapshot contents.
|
||||
|
||||
Related references
|
||||
- security/identity-tenancy-and-scopes.md
|
||||
- security/row-level-security.md
|
||||
- docs/operations/multi-tenancy.md
|
||||
@@ -40,3 +40,9 @@ Related references
|
||||
- docs/risk/profiles.md
|
||||
- docs/risk/api.md
|
||||
- docs/guides/epss-integration.md
|
||||
- risk/overview.md
|
||||
- risk/factors.md
|
||||
- risk/formulas.md
|
||||
- risk/profiles.md
|
||||
- risk/explainability.md
|
||||
- risk/api.md
|
||||
|
||||
21
docs2/security/row-level-security.md
Normal file
21
docs2/security/row-level-security.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Row-level security
|
||||
|
||||
Purpose
|
||||
- Enforce tenant isolation at the database level with RLS policies.
|
||||
|
||||
Strategy
|
||||
- Apply RLS to tenant-scoped tables and views.
|
||||
- Require app.tenant_id session setting on every connection.
|
||||
- Deny access when tenant context is missing.
|
||||
|
||||
Policy evaluation
|
||||
- Policies filter rows by tenant_id and optional scope.
|
||||
- Admin bypass uses explicit roles with audited access.
|
||||
|
||||
Validation
|
||||
- Run cross-tenant read and write tests in staging.
|
||||
- Include RLS checks in deterministic replay suites.
|
||||
|
||||
Related references
|
||||
- data/postgresql-patterns.md
|
||||
- docs/operations/rls-and-data-isolation.md
|
||||
47
docs2/security/timeline.md
Normal file
47
docs2/security/timeline.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# Timeline forensics
|
||||
|
||||
Purpose
|
||||
- Provide an append-only event ledger for audit, replay, and incident analysis.
|
||||
- Support deterministic exports for offline review.
|
||||
|
||||
Event model
|
||||
- event_id (ULID)
|
||||
- tenant
|
||||
- timestamp (UTC ISO-8601)
|
||||
- category (scanner, policy, runtime, evidence, notify)
|
||||
- details (JSON payload)
|
||||
- trace_id for correlation
|
||||
|
||||
Event kinds
|
||||
- scan.completed
|
||||
- policy.verdict
|
||||
- attestation.verified
|
||||
- evidence.ingested
|
||||
- notify.sent
|
||||
- runtime.alert
|
||||
- redaction_notice (compensating event)
|
||||
|
||||
APIs
|
||||
- GET /api/v1/timeline/events with filters for tenant, category, time window, trace_id.
|
||||
- GET /api/v1/timeline/events/{id} for a single event.
|
||||
- GET /api/v1/timeline/export for NDJSON exports.
|
||||
- Headers: X-Stella-Tenant, optional X-Stella-TraceId, If-None-Match.
|
||||
|
||||
Query guidance
|
||||
- Use category plus trace_id to track scan to policy to notify flow.
|
||||
- Use tenant and timestamp ranges for SLA audits.
|
||||
- CLI parity: stella timeline list mirrors the API.
|
||||
|
||||
Retention and redaction
|
||||
- Append-only storage; no deletes.
|
||||
- Redactions use redaction_notice events that reference the superseded event.
|
||||
- Retention is tenant-configurable and exported weekly to cold storage.
|
||||
|
||||
Offline posture
|
||||
- Offline kits include timeline exports for compliance review.
|
||||
- Exports include stable ordering and manifest hashes.
|
||||
|
||||
Related references
|
||||
- security/forensics-and-evidence-locker.md
|
||||
- observability.md
|
||||
- docs/forensics/timeline.md
|
||||
@@ -10,15 +10,37 @@ Core states (examples)
|
||||
- U4: Unknown (no analysis yet)
|
||||
|
||||
Tiers and scoring
|
||||
- Tiers group states by entropy ranges.
|
||||
- The aggregate tier is the maximum severity present.
|
||||
- Risk score adds an entropy-based modifier.
|
||||
- Tiers group states by entropy ranges (T1 high to T4 negligible).
|
||||
- Aggregate tier is the maximum tier across states.
|
||||
- Risk score adds tier and entropy modifiers.
|
||||
|
||||
Tier ranges (example)
|
||||
- T1: 0.7 to 1.0, blocks not_affected.
|
||||
- T2: 0.4 to 0.69, warns on not_affected.
|
||||
- T3: 0.1 to 0.39, allow with caveat.
|
||||
- T4: 0.0 to 0.09, no special handling.
|
||||
|
||||
Risk score formula (simplified)
|
||||
- meanEntropy = avg(states[].entropy)
|
||||
- entropyBoost = clamp(meanEntropy * k, 0..boostCeiling)
|
||||
- tierModifier = {T1:0.50, T2:0.25, T3:0.10, T4:0.00}[aggregateTier]
|
||||
- riskScore = clamp(baseScore * (1 + tierModifier + entropyBoost), 0..1)
|
||||
|
||||
Policy guidance
|
||||
- High uncertainty blocks not_affected claims.
|
||||
- Lower tiers allow decisions with caveats.
|
||||
- Remediation hints are attached to findings.
|
||||
|
||||
Remediation examples
|
||||
- U1: upload symbols or resolve unknowns registry.
|
||||
- U2: generate lockfile and resolve package coordinates.
|
||||
- U3: cross-reference trusted advisories.
|
||||
- U4: run initial analysis to remove unknown state.
|
||||
|
||||
Payload fields
|
||||
- states[] include code, name, entropy, tier, timestamp, evidence.
|
||||
- aggregateTier and riskScore recorded with computedAt timestamp.
|
||||
|
||||
Determinism rules
|
||||
- Stable ordering of uncertainty states.
|
||||
- UTC timestamps and fixed precision for entropy values.
|
||||
|
||||
@@ -17,3 +17,6 @@
|
||||
- Interop checks against external tooling formats.
|
||||
- Offline E2E runs as a release gate.
|
||||
- Policy and schema validation in CI.
|
||||
|
||||
Related references
|
||||
- testing/router-chaos.md
|
||||
|
||||
34
docs2/testing/router-chaos.md
Normal file
34
docs2/testing/router-chaos.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Router chaos testing
|
||||
|
||||
Purpose
|
||||
- Validate backpressure, recovery, and cache failure behavior for the router.
|
||||
|
||||
Test categories
|
||||
- Load testing with spike scenarios (baseline, 10x, 50x, recovery).
|
||||
- Backpressure verification for 429 and 503 with Retry-After.
|
||||
- Recovery tests to ensure queues drain quickly.
|
||||
- Valkey failure injection with graceful fallback.
|
||||
|
||||
Expected behavior
|
||||
- Normal load returns 200 OK.
|
||||
- High load returns 429 with Retry-After.
|
||||
- Critical load returns 503 with Retry-After.
|
||||
- Recovery within 30 seconds, zero data loss.
|
||||
|
||||
Metrics
|
||||
- http_requests_total{status}
|
||||
- router_request_queue_depth
|
||||
- request_recovery_seconds
|
||||
|
||||
Alert cues
|
||||
- Throttle rate above 10% for 5 minutes.
|
||||
- P95 recovery time above 30 seconds.
|
||||
- Missing Retry-After headers.
|
||||
|
||||
CI integration
|
||||
- Runs on PRs touching router code and nightly staging runs.
|
||||
- Stores results as artifacts for audits.
|
||||
|
||||
Related references
|
||||
- operations/router-rate-limiting.md
|
||||
- docs/operations/router-chaos-testing-runbook.md
|
||||
@@ -18,6 +18,10 @@ Architecture and system model
|
||||
docs/modules/platform/architecture-overview.md, docs/modules/*/architecture.md
|
||||
- Docs2: architecture/overview.md, architecture/workflows.md, modules/index.md
|
||||
|
||||
Advisory alignment
|
||||
- Sources: docs/architecture/advisory-alignment-report.md
|
||||
- Docs2: architecture/advisory-alignment.md
|
||||
|
||||
Component map
|
||||
- Sources: docs/technical/architecture/component-map.md
|
||||
- Docs2: architecture/component-map.md
|
||||
@@ -77,7 +81,7 @@ Advisory AI
|
||||
Orchestrator detail
|
||||
- Sources: docs/orchestrator/*
|
||||
- Docs2: orchestrator/overview.md, orchestrator/architecture.md, orchestrator/api.md,
|
||||
orchestrator/cli.md, orchestrator/console.md
|
||||
orchestrator/cli.md, orchestrator/console.md, orchestrator/runbook.md
|
||||
|
||||
Orchestrator run ledger
|
||||
- Sources: docs/orchestrator/run-ledger.md
|
||||
@@ -118,7 +122,10 @@ Replay and determinism
|
||||
|
||||
Runbooks and incident response
|
||||
- Sources: docs/runbooks/*, docs/operations/*
|
||||
- Docs2: operations/runbooks.md
|
||||
- Docs2: operations/runbooks.md, operations/key-rotation.md,
|
||||
operations/proof-verification.md, operations/score-proofs.md,
|
||||
operations/reachability.md, operations/trust-lattice.md,
|
||||
operations/unknowns-queue.md
|
||||
|
||||
Notifications
|
||||
- Sources: docs/notifications/*, docs/modules/notify/*
|
||||
@@ -129,7 +136,8 @@ Notifications details
|
||||
docs/notifications/channels.md, docs/notifications/templates.md,
|
||||
docs/notifications/digests.md, docs/notifications/pack-approvals-integration.md
|
||||
- Docs2: notifications/overview.md, notifications/rules.md, notifications/channels.md,
|
||||
notifications/templates.md, notifications/digests.md, notifications/pack-approvals.md
|
||||
notifications/templates.md, notifications/digests.md, notifications/pack-approvals.md,
|
||||
notifications/runbook.md
|
||||
|
||||
Router rate limiting
|
||||
- Sources: docs/router/*
|
||||
@@ -138,7 +146,8 @@ Router rate limiting
|
||||
Release engineering and CI/DevOps
|
||||
- Sources: docs/13_RELEASE_ENGINEERING_PLAYBOOK.md, docs/ci/*, docs/devops/*,
|
||||
docs/release/*, docs/releases/*
|
||||
- Docs2: release/release-engineering.md
|
||||
- Docs2: release/release-engineering.md, release/promotion-attestations.md,
|
||||
release/release-notes.md
|
||||
|
||||
API and contracts
|
||||
- Sources: docs/09_API_CLI_REFERENCE.md, docs/api/*, docs/schemas/*,
|
||||
@@ -177,7 +186,8 @@ Regulator threat and evidence model
|
||||
Identity, tenancy, and scopes
|
||||
- Sources: docs/security/authority-scopes.md, docs/security/scopes-and-roles.md,
|
||||
docs/architecture/console-admin-rbac.md
|
||||
- Docs2: security/identity-tenancy-and-scopes.md
|
||||
- Docs2: security/identity-tenancy-and-scopes.md, security/multi-tenancy.md,
|
||||
security/row-level-security.md
|
||||
|
||||
Console admin RBAC
|
||||
- Sources: docs/architecture/console-admin-rbac.md
|
||||
@@ -213,20 +223,26 @@ Quota and licensing
|
||||
|
||||
Risk model and scoring
|
||||
- Sources: docs/risk/*, docs/contracts/risk-scoring.md
|
||||
- Docs2: security/risk-model.md
|
||||
- Docs2: security/risk-model.md, risk/overview.md, risk/factors.md, risk/formulas.md,
|
||||
risk/profiles.md, risk/explainability.md, risk/api.md
|
||||
|
||||
Forensics and evidence locker
|
||||
- Sources: docs/forensics/*, docs/evidence-locker/*
|
||||
- Docs2: security/forensics-and-evidence-locker.md
|
||||
- Sources: docs/forensics/*, docs/evidence-locker/*, docs/ops/evidence-locker-handoff.md
|
||||
- Docs2: security/forensics-and-evidence-locker.md, security/evidence-locker-publishing.md
|
||||
|
||||
Timeline forensics
|
||||
- Sources: docs/forensics/timeline.md
|
||||
- Docs2: security/timeline.md
|
||||
|
||||
Provenance and transparency
|
||||
- Sources: docs/provenance/*, docs/security/trust-and-signing.md,
|
||||
docs/modules/attestor/*, docs/modules/signer/*
|
||||
- Docs2: provenance/inline-provenance.md
|
||||
- Docs2: provenance/inline-provenance.md, provenance/attestation-workflow.md,
|
||||
provenance/rekor-policy.md, provenance/backfill.md
|
||||
|
||||
Database and persistence
|
||||
- Sources: docs/db/*, docs/adr/0001-postgresql-for-control-plane.md
|
||||
- Docs2: data/persistence.md
|
||||
- Docs2: data/persistence.md, data/postgresql-operations.md, data/postgresql-patterns.md
|
||||
|
||||
Events and messaging
|
||||
- Sources: docs/events/*, docs/samples/*
|
||||
@@ -334,19 +350,22 @@ Vuln Explorer overview
|
||||
|
||||
Testing and quality
|
||||
- Sources: docs/19_TEST_SUITE_OVERVIEW.md, docs/testing/*
|
||||
- Docs2: testing-and-quality.md
|
||||
- Docs2: testing-and-quality.md, testing/router-chaos.md
|
||||
|
||||
Observability and telemetry
|
||||
- Sources: docs/metrics/*, docs/observability/*, docs/modules/telemetry/*,
|
||||
docs/technical/observability/*
|
||||
- Docs2: observability.md
|
||||
- Docs2: observability.md, observability-standards.md, observability-logging.md,
|
||||
observability-tracing.md, observability-metrics-slos.md, observability-telemetry-controls.md,
|
||||
observability-aoc.md, observability-aggregation.md, observability-policy.md,
|
||||
observability-ui-telemetry.md, observability-vuln-telemetry.md
|
||||
|
||||
Benchmarks and performance
|
||||
- Sources: docs/benchmarks/*, docs/12_PERFORMANCE_WORKBOOK.md
|
||||
- Docs2: benchmarks.md
|
||||
|
||||
Guides and workflows
|
||||
- Sources: docs/guides/*, docs/ci/sarif-integration.md
|
||||
- Sources: docs/guides/*, docs/ci/sarif-integration.md, docs/architecture/epss-versioning-clarification.md
|
||||
- Docs2: guides/compare-workflow.md, guides/epss-integration.md
|
||||
|
||||
Examples and fixtures
|
||||
|
||||
Reference in New Issue
Block a user