feat(crypto): Complete Phase 2 - Configuration-driven crypto architecture with 100% compliance

## Summary

This commit completes Phase 2 of the configuration-driven crypto architecture, achieving
100% crypto compliance by eliminating all hardcoded cryptographic implementations.

## Key Changes

### Phase 1: Plugin Loader Infrastructure
- **Plugin Discovery System**: Created StellaOps.Cryptography.PluginLoader with manifest-based loading
- **Configuration Model**: Added CryptoPluginConfiguration with regional profiles support
- **Dependency Injection**: Extended DI to support plugin-based crypto provider registration
- **Regional Configs**: Created appsettings.crypto.{international,russia,eu,china}.yaml
- **CI Workflow**: Added .gitea/workflows/crypto-compliance.yml for audit enforcement

### Phase 2: Code Refactoring
- **API Extension**: Added ICryptoProvider.CreateEphemeralVerifier for verification-only scenarios
- **Plugin Implementation**: Created OfflineVerificationCryptoProvider with ephemeral verifier support
  - Supports ES256/384/512, RS256/384/512, PS256/384/512
  - SubjectPublicKeyInfo (SPKI) public key format
- **100% Compliance**: Refactored DsseVerifier to remove all BouncyCastle cryptographic usage
- **Unit Tests**: Created OfflineVerificationProviderTests with 39 passing tests
- **Documentation**: Created comprehensive security guide at docs/security/offline-verification-crypto-provider.md
- **Audit Infrastructure**: Created scripts/audit-crypto-usage.ps1 for static analysis

### Testing Infrastructure (TestKit)
- **Determinism Gate**: Created DeterminismGate for reproducibility validation
- **Test Fixtures**: Added PostgresFixture and ValkeyFixture using Testcontainers
- **Traits System**: Implemented test lane attributes for parallel CI execution
- **JSON Assertions**: Added CanonicalJsonAssert for deterministic JSON comparisons
- **Test Lanes**: Created test-lanes.yml workflow for parallel test execution

### Documentation
- **Architecture**: Created CRYPTO_CONFIGURATION_DRIVEN_ARCHITECTURE.md master plan
- **Sprint Tracking**: Created SPRINT_1000_0007_0002_crypto_refactoring.md (COMPLETE)
- **API Documentation**: Updated docs2/cli/crypto-plugins.md and crypto.md
- **Testing Strategy**: Created testing strategy documents in docs/implplan/SPRINT_5100_0007_*

## Compliance & Testing

-  Zero direct System.Security.Cryptography usage in production code
-  All crypto operations go through ICryptoProvider abstraction
-  39/39 unit tests passing for OfflineVerificationCryptoProvider
-  Build successful (AirGap, Crypto plugin, DI infrastructure)
-  Audit script validates crypto boundaries

## Files Modified

**Core Crypto Infrastructure:**
- src/__Libraries/StellaOps.Cryptography/CryptoProvider.cs (API extension)
- src/__Libraries/StellaOps.Cryptography/CryptoSigningKey.cs (verification-only constructor)
- src/__Libraries/StellaOps.Cryptography/EcdsaSigner.cs (fixed ephemeral verifier)

**Plugin Implementation:**
- src/__Libraries/StellaOps.Cryptography.Plugin.OfflineVerification/ (new)
- src/__Libraries/StellaOps.Cryptography.PluginLoader/ (new)

**Production Code Refactoring:**
- src/AirGap/StellaOps.AirGap.Importer/Validation/DsseVerifier.cs (100% compliant)

**Tests:**
- src/__Libraries/__Tests/StellaOps.Cryptography.Plugin.OfflineVerification.Tests/ (new, 39 tests)
- src/__Libraries/__Tests/StellaOps.Cryptography.PluginLoader.Tests/ (new)

**Configuration:**
- etc/crypto-plugins-manifest.json (plugin registry)
- etc/appsettings.crypto.*.yaml (regional profiles)

**Documentation:**
- docs/security/offline-verification-crypto-provider.md (600+ lines)
- docs/implplan/CRYPTO_CONFIGURATION_DRIVEN_ARCHITECTURE.md (master plan)
- docs/implplan/SPRINT_1000_0007_0002_crypto_refactoring.md (Phase 2 complete)

## Next Steps

Phase 3: Docker & CI/CD Integration
- Create multi-stage Dockerfiles with all plugins
- Build regional Docker Compose files
- Implement runtime configuration selection
- Add deployment validation scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
master
2025-12-23 18:20:00 +02:00
parent b444284be5
commit dac8e10e36
241 changed files with 22567 additions and 307 deletions

42
docs2/orchestrator/api.md Normal file
View File

@@ -0,0 +1,42 @@
# Orchestrator API
Scope and headers
- Base path: /api/v1/orchestrator.
- Headers: Authorization Bearer token, X-Stella-Tenant, Idempotency-Key for POSTs.
- traceparent is recommended for tracing.
- Error envelope follows api/overview.md.
DAG management
- POST /dags: create or publish a DAG version with steps, edges, metadata, signature.
- GET /dags: list DAGs sorted by dagId then version desc; filter by dagId or active.
- GET /dags/{dagId}/{version}: fetch DAG definition.
- POST /dags/{dagId}/{version}:disable: disable a version (admin scope).
Runs
- POST /runs: start a run with dagId, optional version, inputs, and runToken.
- GET /runs: list runs with filters for dagId, status, from, to; sorted by startedUtc desc.
- GET /runs/{runId}: run details with step hashes and status.
- POST /runs/{runId}:cancel: request cancellation (best-effort).
Steps and artifacts
- GET /runs/{runId}/steps: list step executions.
- GET /runs/{runId}/steps/{stepId}: step details with attempts and outputs hash.
- GET /artifacts/{hash}: retrieve content-addressed artifacts owned by the tenant.
WebSocket stream
- GET /runs/stream?dagId=&status=: NDJSON events for run and step updates.
- Event types: run.started, run.updated, step.updated, run.completed, run.failed, run.cancelled.
Admin and ops
- POST /admin/warm: warm caches for DAGs and plugins.
- GET /admin/health: readiness with queue depth by tenant.
- GET /admin/metrics: Prometheus scrape endpoint.
Determinism and offline
- List endpoints return deterministic ordering; pagination uses page_token and page_size.
- Hashes are lower-case hex; timestamps UTC ISO-8601.
- No remote fetches; DAGs and plugins are preloaded in offline bundles.
Security
- Scopes: orchestrator:read, orchestrator:write, orchestrator:admin.
- Tenant isolation enforced on every endpoint.

View File

@@ -0,0 +1,43 @@
# Orchestrator architecture
Runtime components
- WebService: REST and WebSocket API for DAG definitions, runs, and admin actions.
- Scheduler: cron and timer triggers that enqueue run intents.
- Worker: executes DAG steps, enforces resource limits, and reports telemetry.
- Plugin host: loads task plugins from signed offline bundles.
Data model
- DAG: directed acyclic graph with deterministic topological ordering.
- Run: immutable record with runId, dagVersion, tenant, inputsHash, status, traceId, startedUtc, endedUtc.
- Step execution: stepId, inputsHash, outputsHash, status, attempt, durationMs, logsRef, metricsRef.
Execution flow
- Run creation is idempotent on runToken, dagId, and inputsHash.
- Scheduler enqueues run intent to a tenant queue.
- Worker reconstructs DAG order, executes steps, applies retries and backoff.
- WebSocket streams run and step status updates.
Storage and queues
- PostgreSQL stores DAG specs, versions, and run history.
- Queues are per-tenant FIFO in PostgreSQL or Valkey-backed lists.
- Artifacts are content-addressed and stored in object storage or large objects.
Security and AOC alignment
- Tenant header required on every request; cross-tenant DAGs are forbidden.
- Scopes: orchestrator:read, orchestrator:write, orchestrator:admin.
- AOC alignment: orchestrator schedules and records only; no policy decisions.
- Step sandboxing enforces CPU and memory limits; network egress deny by default.
Determinism
- Step ordering uses topological order with lexical tie-breaks.
- Retries preserve traceId and reuse the same runToken.
- Timestamps UTC; hashes lower-case hex.
Offline posture
- DAG specs and plugins are loaded from offline bundles with signatures.
- Exports of runs, steps, and logs are available as NDJSON.
Observability
- Traces: orchestrator.run and orchestrator.step with tenant, dagId, runId, stepId.
- Metrics: orchestrator_runs_total, orchestrator_run_duration_seconds, orchestrator_queue_depth.
- Logs: structured JSON with trace_id, tenant, dagId, runId, stepId.

26
docs2/orchestrator/cli.md Normal file
View File

@@ -0,0 +1,26 @@
# Orchestrator CLI
Commands
- stella orch dag list: list DAGs sorted by dagId then version desc.
- stella orch dag publish --file dag.yaml --signature sig.dsse: publish a DAG version.
- stella orch dag disable --dag-id <id> --version <ver>: disable a DAG version.
- stella orch run start --dag-id <id> --inputs inputs.json --run-token <uuid>: start a run.
- stella orch run list: list runs with filters for dagId, status, from, to.
- stella orch run cancel --run-id <id>: request cancellation.
- stella orch run logs --run-id <id> --step-id <step>: fetch logs or artifacts.
- stella orch run stream --dag-id <id>: stream NDJSON run events.
Global flags
- --tenant, --api-url, --token, --traceparent, --output json|table.
- --page-size and --page-token for list pagination.
Determinism and offline
- CLI preserves API ordering and fixed table columns.
- Timestamps print UTC; hashes lower-case hex.
- Works against local WebService without external downloads.
Exit codes
- 0 success.
- 1 validation or HTTP error.
- 2 auth or tenant missing.
- 3 cancellation rejected.

View File

@@ -0,0 +1,27 @@
# Orchestrator console
Views
- Run list sorted by startedUtc desc then runId.
- Run detail with step graph, attempts, duration, logs links, and outputs hash.
- DAG catalog with signatures and enable or disable state.
- Queue health with per-tenant depth, age, and worker availability.
Actions
- Start run with DAG version, inputs JSON, and optional run token.
- Cancel run with rationale.
- Download artifacts and logs.
- Stream live updates via WebSocket.
Accessibility and UX
- Shortcuts: f for filter, r for refresh, s for start run.
- Timestamps are UTC; durations show raw ms in tooltips.
- Status badges include icons and text; empty states show retry guidance.
Determinism and offline
- Client sorting mirrors API order; pagination uses stable page tokens.
- Works against local WebService with bundled fonts and assets.
- Exports for runs and steps are available as NDJSON.
Safety
- Tenant scope enforced; cross-tenant DAGs hidden.
- Logs are redacted server-side; secrets never rendered in the UI.

View File

@@ -0,0 +1,41 @@
# Orchestrator overview
Mission
- Coordinate deterministic job execution across modules.
- Provide reproducible DAG runs with tenant isolation and auditability.
Runtime shape
- WebService for REST and WebSocket APIs and UI status.
- Scheduler creates runs from schedules and enqueues intents.
- Worker executes DAG steps from per-tenant queues.
- Plugin host loads signed task plugins from offline bundles.
Determinism
- Stable DAG evaluation order with lexical tie-breaks.
- Idempotency keys per run and step hash.
- UTC timestamps and ordered NDJSON exports.
AOC alignment
- Orchestrator runs declared steps and records outcomes.
- It does not derive policy verdicts or merge advisory data.
State and storage
- Run metadata stored in PostgreSQL with tenant scoping.
- Queues stored in PostgreSQL or Valkey-backed FIFO per tenant.
- Artifacts referenced by content hash in object storage or large objects.
- Optional Valkey locks for throttles and backpressure.
Offline posture
- DAG specs and plugins are loaded from offline bundles.
- Network egress is deny by default unless a task declares an allowlist.
Observability
- Metrics for runs, durations, and queue depth.
- Structured logs with tenant, dagId, runId, and status.
Related references
- orchestrator/architecture.md
- orchestrator/api.md
- orchestrator/cli.md
- orchestrator/console.md
- orchestrator/run-ledger.md

View File

@@ -0,0 +1,26 @@
# Orchestrator run ledger
Purpose
- Immutable record of DAG runs and step executions for audit and replay.
Core fields
- tenant, runId, dagId, dagVersion, runToken, traceId.
- status and timestamps (startedUtc, endedUtc, durationMs).
- inputsHash and outputsHash at run and step levels.
Step records
- stepId, attempt, status, timing, errorCode, retryable.
- logsRef and metricsRef point to content-addressed artifacts.
Storage and exports
- Tenant-scoped PostgreSQL tables with indexes on tenant, status, and time.
- Append-only updates; status transitions are monotonic.
- NDJSON exports are sorted by startedUtc then runId.
- Artifacts are content-addressed; hashes point to object storage or large objects.
Governance
- Runs are never deleted; cancellation is recorded as an event.
- Admin queries require orchestrator:admin scope.
Related references
- orchestrator/overview.md