From 8442fcb8079195637ed7926be3d54192c577a8b1 Mon Sep 17 00:00:00 2001 From: master <> Date: Wed, 8 Apr 2026 18:44:04 +0300 Subject: [PATCH] docs(audit): sprint plan for endpoint filters + per-service table deprecation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Map 532 state-changing endpoints across 9 services for AuditActionFilter - Plan 5-batch migration: convention helper → complex services → dual-write → read migration → drop local tables - Reclassify Authority auth-protocol and Policy gate-bypass audit as domain evidence - 24 days active work + 120-day verification pipeline Co-Authored-By: Claude Opus 4.6 (1M context) --- ..._005_Audit_endpoint_filters_deprecation.md | 677 ++++++++++++++++++ 1 file changed, 677 insertions(+) create mode 100644 docs/implplan/SPRINT_20260408_005_Audit_endpoint_filters_deprecation.md diff --git a/docs/implplan/SPRINT_20260408_005_Audit_endpoint_filters_deprecation.md b/docs/implplan/SPRINT_20260408_005_Audit_endpoint_filters_deprecation.md new file mode 100644 index 000000000..ad0367454 --- /dev/null +++ b/docs/implplan/SPRINT_20260408_005_Audit_endpoint_filters_deprecation.md @@ -0,0 +1,677 @@ +# Sprint 20260408-005 -- AuditActionFilter Endpoint Wiring & Per-Service Audit Table Deprecation + +## Topic & Scope + +- **Wire `AuditActionFilter` across all 9 services** that already call `AddAuditEmission()` in their `Program.cs`, annotating every state-changing endpoint with `AuditActionAttribute` so that every POST/PUT/PATCH/DELETE emits a structured audit event to the Timeline unified sink. +- **Deprecate per-service audit tables** in Authority, Policy, Notify, Scheduler, Attestor, and JobEngine/ReleaseOrchestrator through a phased dual-write -> read-migration -> drop pipeline. +- This sprint implements AUDIT-002 and AUDIT-005 from `SPRINT_20260408_004_Timeline_unified_audit_sink.md`. +- Working directory: `src/__Libraries/StellaOps.Audit.Emission/`, cross-module endpoint files, per-service persistence directories. +- Expected evidence: all state-changing endpoints decorated, audit events visible in Timeline `/api/v1/audit/events`, dual-write verified, deprecation headers on legacy endpoints, zero data loss. + +## Dependencies & Concurrency + +- **Upstream**: AUDIT-001 (DONE) -- PostgreSQL persistence for Timeline audit ingest is complete. `PostgresUnifiedAuditEventStore` with SHA-256 hash chain is operational. +- **Upstream**: `AddAuditEmission()` is already called in 9 services: Authority, Policy, Release-Orchestrator, EvidenceLocker, Notify, Scanner, Scheduler, Integrations, Platform. No DI wiring needed. +- Batches 1-2 (filter annotation) can run in parallel across services. +- Batch 3 (dual-write) can begin once Batch 1-2 is verified for a given service. +- Batches 4-5 (read migration, table drop) are sequential and must wait for verification periods. + +## Documentation Prerequisites + +- `src/__Libraries/StellaOps.Audit.Emission/AuditActionFilter.cs` -- filter behavior, no-op when attribute missing. +- `src/__Libraries/StellaOps.Audit.Emission/AuditActionAttribute.cs` -- module/action/resourceType parameters. +- `docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md` -- parent sprint context. + +--- + +## Part 1: Endpoint Filter Annotation Plan + +### Convention Mode Assessment + +**The `AuditActionFilter` supports a passive convention mode.** Reading the filter source: +- If `AuditActionAttribute` metadata is NOT present on the endpoint, the filter is a **no-op passthrough** (line 48: returns `result` unchanged). +- The filter can be added at the **RouteGroup level** (ASP.NET Core supports `group.AddEndpointFilter()`), which applies it to all endpoints in the group. +- Only endpoints explicitly annotated with `.WithMetadata(new AuditActionAttribute("module", "action"))` will emit events. + +**Recommended approach: hybrid group + per-endpoint annotation.** +1. Add `group.AddEndpointFilter()` once at each service's main API route group. +2. Add `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints. +3. GET endpoints remain unannotated and the filter passes through silently. + +This minimizes the per-endpoint boilerplate (no `.AddEndpointFilter()` on each endpoint) while keeping explicit control over which actions are audited. + +### Per-Service Endpoint Inventory + +#### 1. Scanner (module: "scanner") -- 30 endpoint files, ~65 state-changing endpoints + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Sources CRUD | 8 | create, update, delete, test, pause, resume, activate, trigger_scan | +| Scan submission | 2 | submit, attach_entropy | +| SBOM submission/upload | 2 | submit_sbom, upload | +| Scan policy CRUD | 3 | create, update, delete | +| Approvals | 2 | create, revoke | +| Triage (status, VEX, batch, proof) | 5 | update_status, submit_vex, batch_action, generate_proof, bulk_query | +| Webhooks (generic + provider-specific) | 5 | receive_webhook | +| Reports | 1 | create | +| Reachability (compute, analyze, VEX) | 3 | compute, analyze, generate_vex | +| Secret detection settings | 5 | create, update, delete (settings + exceptions) | +| SmartDiff/VEX candidates review | 2 | review | +| Score replay/verify | 4 | replay, verify | +| Validation/fidelity | 3 | validate, analyze, upgrade | +| Offline kit | 2 | import, validate | +| Call graph | 1 | submit | +| Witness verify | 1 | verify | +| Runtime events | 2 | events, reconcile | +| Other (delta compare, EPSS batch, counterfactual, slice, replay attach, GitHub SARIF, policy diagnostics/preview/runtime/overlay/linksets, composition verify) | ~14 | various | + +#### 2. Integrations (module: "integrations") -- 1 endpoint file, 6 state-changing endpoints + +| Endpoint | Action | +|---|---| +| `POST /` | create | +| `PUT /{id}` | update | +| `DELETE /{id}` | delete | +| `POST /{id}/test` | test | +| `POST /{id}/discover` | discover | +| `POST /ai-code-guard/run` | run_code_guard | + +#### 3. Platform (module: "platform") -- 23 endpoint files, ~107 state-changing endpoints + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Setup wizard sessions/steps | 14 | create_session, resume, execute_step, skip_step, run_checks, prerequisites, update_config, finalize | +| Trust signing (keys, issuers, certs, transparency log) | 10 | create_key, rotate_key, revoke_key, create_issuer, block_issuer, unblock_issuer, create_cert, revoke_cert, update_transparency_log | +| Identity providers | 7 | create, update, delete, enable, disable, test, apply | +| Environment settings admin | 2 | update, delete | +| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility | +| Release control (bundles, versions, materialize) | 3 | create_bundle, create_version, materialize | +| Release orchestrator environments (env CRUD, targets, freeze windows) | 12 | create, update, delete (env/target/freeze_window), update_settings, health_check | +| Function maps | 3 | create, delete, verify | +| Localization | 2 | update_bundles, delete_string | +| Crypto provider admin | 2 | update_preferences, delete_preferences | +| Context | 1 | update_preferences | +| Assistant (user state, tips, tours, glossary) | 5 | update_user_state, create_tip, delete_tip, create_tour, create_glossary | +| Federation telemetry | 3 | grant_consent, revoke_consent, trigger | +| Notify compatibility | 13 | create/delete (schedules, quiet_hours, throttle, escalation, localizations), simulate, ack_incident | +| Signals compatibility | 5 | create_trigger, update_trigger, delete_trigger, toggle_trigger, retry | +| Evidence threads | 3 | export, transcript, collect | +| Score | 2 | evaluate, verify | +| Policy interop | 4 | export, import, validate, evaluate | +| Quota/AoC compatibility, onboarding, profiles, seed | ~10 | various | +| Migration admin | 1 | run | + +#### 4. Authority (module: "authority") -- 10 endpoint files, ~49 state-changing endpoints + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Tenant CRUD + suspend/resume | 4 | create, update, suspend, resume | +| User CRUD + enable/disable | 4 | create, update, disable, enable | +| Role CRUD + preview impact | 3 | create, update, preview_impact | +| Client CRUD + rotate | 3 | create, update, rotate | +| Token revoke | 1 | revoke | +| Branding update/preview | 2 | update, preview | +| Airgap audit record | 1 | record | +| Bootstrap users/clients/invites/service-accounts/signing/notifications/plugins | 8 | bootstrap_create, rotate, reload | +| OpenIddict (token, introspect, revoke) | 3 | issue_token, introspect, revoke_token | +| Authorize | 1 | authorize | +| IssuerDirectory (issuer CRUD, key CRUD, trust CRUD) | 8 | create, update, delete (issuer/key/trust) | +| Notify ack-tokens + vuln workflow tokens + attachment tokens | 6 | rotate, issue, verify | +| Vulnerability tickets + advisory AI logs | 2 | create_ticket, log_inference | +| Console token introspect + vuln ticket | 2 | introspect, create_ticket | + +#### 5. Policy (module: "policy") -- 57 endpoint files in Engine + 11 in Gateway, ~162+56 state-changing endpoints (many duplicated between Engine and Gateway) + +**Note**: Policy Engine and Policy Gateway share nearly identical endpoint files (Gateway proxies to Engine). Annotation should target the Engine endpoints; Gateway endpoints should mirror the same attributes. + +| Endpoint Group (Engine) | Count | Action(s) | +|---|---|---| +| Governance CRUD (policies, rules, thresholds) | 9 | create, update, delete, enable, disable, reorder, import, export, clone | +| Policy simulation (create, cancel, retry, preview, compare, what-if, etc.) | 20 | create, cancel, retry, preview, compare, simulate | +| Exception management (create, approve, reject, revoke, extend) | 6 | create, approve, reject, revoke, extend, batch | +| Exception approvals (approve, reject, escalate, delegate) | 4 | approve, reject, escalate, delegate | +| Gate operations (evaluate, force-pass) | 2 | evaluate, force_pass | +| Gates CRUD | 2 | create, delete | +| Score gate (evaluate, verify) | 2 | evaluate, verify | +| Risk profile CRUD + air-gap sync | 9 | create, update, delete, sync_airgap, import, export | +| Risk budget management | 3 | create, update, delete | +| Risk simulation (run, preview, batch, sensitivity, compare, rebase, budget) | 7 | run, preview, batch, compare | +| Policy pack CRUD | 5 | create, update, delete, activate, deactivate | +| Policy pack bundles | 1 | create | +| Override CRUD | 5 | create, update, delete, expire, batch | +| Verification policy CRUD + editor | 6 | create, update, delete, compile, validate | +| Scope attachment | 4 | attach, detach, reorder, bulk | +| Snapshots (create, restore) | 2 | create, restore | +| Violations (acknowledge, dismiss, reopen) | 5 | acknowledge, dismiss, reopen | +| Staleness (configure, reset) | 2 | configure, reset | +| Sealed mode (enable, disable, emergency) | 3 | enable, disable, emergency | +| Profile events (create, ack) | 2 | create, acknowledge | +| Conflict resolution | 3 | resolve, merge, override | +| Policy decision, batch evaluation, policy compilation, lint | 4 | evaluate, batch, compile, lint | +| Registry webhooks | 3 | register, update, delete | +| Deltas | 2 | compute, compare | +| Attestation reports + console | 6 | create, export, verify | +| CVSS receipts | 2 | submit, verify | +| Budget endpoints | 1 | allocate | +| Determinization config | 2 | update, audit | +| Other (tool lattice, advisory AI knobs, trust weighting, overlay sim, path scope sim, evidence summary, delta-if-present, air-gap notifications, profile export, console export, ledger export, orchestrator job, policy worker, console simulation, batch context, verify determinism, unknown tracking) | ~20 | various | + +#### 6. Release-Orchestrator (module: "release-orchestrator") -- 9 endpoint files, ~40 state-changing endpoints (excluding legacy stubs) + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Release CRUD | 4 | create, update, delete, clone | +| Release lifecycle (ready, promote, deploy, rollback) | 4 | mark_ready, promote, deploy, rollback | +| Release components CRUD | 3 | add, update, remove | +| Approvals (approve, reject, batch) | 4 | approve, reject, batch_approve, batch_reject | +| Release dashboard (approve/reject promotion) | 2 | approve_promotion, reject_promotion | +| Deployment operations (create, pause, resume, cancel, rollback, retry target) | 6 | create, pause, resume, cancel, rollback, retry | +| Release control v2 (approval decision, rollback) | 2 | approval_decision, rollback | +| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility | +| Policy gate profiles CRUD + simulate | 9 | create, update, delete, set_default, validate, simulate, bundle_simulate, feed_freshness | +| Evidence verify | 1 | verify | + +**Note**: `JobEngineLegacyEndpoints` contains catch-all `{**rest}` stubs that return 501; these do NOT need audit annotation. + +#### 7. EvidenceLocker (module: "evidence") -- 2 endpoint files + Program.cs, ~7 state-changing endpoints + +| Endpoint | Action | +|---|---| +| `POST /evidence` | store | +| `POST /evidence/snapshot` | snapshot | +| `POST /evidence/verify` | verify | +| `POST /evidence/hold/{caseId}` | hold | +| `POST /verdicts/` | store_verdict | +| `POST /verdicts/{id}/verify` | verify_verdict | +| `POST /exports/{bundleId}/export` | export | + +#### 8. Notify (module: "notify") -- 15 endpoint files, ~65 state-changing endpoints + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Rules CRUD (notify API + standalone) | 6 | create, update, delete | +| Templates CRUD + preview + validate (notify API + standalone) | 8 | create, update, delete, preview, validate | +| Incidents (ack, resolve) | 4 | acknowledge, resolve | +| Escalation policies CRUD + schedules CRUD + overrides | 10 | create, update, delete | +| Escalation operations (start, escalate, stop, ack, webhook) | 5 | start, escalate, stop, ack | +| Quiet hours (calendars CRUD + evaluate) | 4 | create, update, delete, evaluate | +| Throttle (config update/delete, evaluate) | 3 | update, delete, evaluate | +| Storm breaker (summary, clear) | 2 | summary, clear | +| Fallback chains + deliveries | 3 | update_chain, test, delete_delivery | +| Localization (format string, update bundles, delete bundle, validate) | 4 | format, update_bundles, delete_bundle, validate | +| Observability (dead letters retry/discard/purge, chaos, retention policies) | 8 | retry, discard, purge, start_experiment, stop_experiment, create_policy, update_policy, delete_policy | +| Security (tokens, keys, webhooks, HTML, tenants, grants) | 12 | sign, verify, rotate, register_webhook, validate, sanitize, strip, validate_tenant, fuzz_test, grant, revoke | +| Operator overrides (create, revoke, check) | 3 | create, revoke, check | +| Simulation (simulate, validate rule) | 2 | simulate, validate | + +#### 9. Scheduler (module: "scheduler") -- 9 endpoint files, ~31 state-changing endpoints + +| Endpoint Group | Count | Action(s) | +|---|---|---| +| Schedules CRUD + pause/resume | 5 | create, update, delete, pause, resume | +| Runs (create, cancel, retry, preview) | 4 | create, cancel, retry, preview | +| Workflow trigger | 1 | trigger | +| Graph jobs (build, overlay, complete hook) | 3 | build, overlay, complete | +| Event webhooks (conselier export, excitor export) | 2 | export | +| Policy runs | 1 | create | +| Policy simulations (create, preview, cancel, retry) | 4 | create, preview, cancel, retry | +| Resolver jobs | 1 | create | +| PacksRegistry (upload, signature, attestation, lifecycle, parity, offline seed, mirrors, mirror sync) | 9 | upload, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror | + +### Total Endpoint Count Summary + +| Service | State-Changing Endpoints | Complexity | +|---|---|---| +| Scanner | ~65 | High (30 files) | +| Integrations | 6 | Low (1 file) | +| Platform | ~107 | High (23 files) | +| Authority | ~49 | Medium (10 files, multiple sub-services) | +| Policy | ~162 (Engine) | Very High (57 files, duplicated in Gateway) | +| Release-Orchestrator | ~40 | Medium (9 files) | +| EvidenceLocker | 7 | Low (3 files) | +| Notify | ~65 | High (15 files) | +| Scheduler | ~31 | Medium (9 files) | +| **TOTAL** | **~532** | | + +--- + +## Part 2: Per-Service Audit Table Deprecation Plan + +### 2.1 Authority -- `authority.audit`, `authority.airgap_audit`, `authority.offline_kit_audit` + +**Writes:** +- `AuthorityAuditSink` (implements `IAuthEventSink`) writes login/auth events via `IAuthorityLoginAttemptStore.InsertAsync()` -- this is a specialized auth event pipeline, NOT a generic endpoint audit filter. +- `AirgapAuditEndpointExtensions` has `POST /authority/audit/airgap` that records airgap-specific audit entries. + +**Reads:** +- `GET /console/admin/audit` -- `ConsoleAdminEndpointExtensions.ListAuditEvents()` reads from the authority.audit table. +- `GET /authority/audit/airgap` -- reads airgap audit entries. +- `GET /authority/incident-audit` -- reads incident audit entries. +- UI: Audit tab in Authority admin console. + +**What breaks if dropped:** Admin audit log in the console loses historical auth event data. The specialized `ClassifiedString` PII classification would be lost. + +**Dual-write path:** The `AuthorityAuditSink` pipeline is fundamentally different from `AuditActionFilter` (it captures auth protocol events like login success/failure, token issuance, not HTTP endpoint calls). **Both are needed**: +- `AuditActionFilter` for admin mutations (user CRUD, role CRUD, client CRUD, tenant management). +- `AuthorityAuditSink` for auth protocol events (login attempts, token grants, lockouts) -- should also emit to Timeline via `IAuditEventEmitter` directly. + +**Migration:** Phase 1: Add `AuditActionFilter` to admin endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` call inside `AuthorityAuditSink.WriteAsync()` to dual-write auth events. Phase 3: Redirect admin audit reads to Timeline. Phase 4: Drop local tables after 90-day verification. + +### 2.2 Policy -- `policy.audit` + `policy.gate_bypass_audit` + +**Writes:** +- `PolicyAuditRepository.CreateAsync()` writes generic policy audit entries. +- `PostgresGateBypassAuditRepository.AddAsync()` writes gate bypass decisions (specialized: actor, decision override, justification, image digest, policy ID, attestation digest). +- `GateBypassAuditor` service calls the bypass audit repository when a gate bypass occurs. + +**Reads:** +- `GET /api/v1/governance/audit/events` + `GET /api/v1/governance/audit/events/{eventId}` -- governance audit events. +- `GET /api/v1/policy/exceptions/{requestId}/audit` -- exception approval trail. +- `GET /api/v1/policy/determinization/audit` -- determinization config audit history. +- `GET /api/v1/policy/simulation/.../audit` -- simulation audit. +- `PolicyAuditRepository.ListAsync()`, `.GetByResourceAsync()`, `.GetByCorrelationIdAsync()`. +- `PostgresGateBypassAuditRepository` reads: `GetByIdAsync`, `GetByDecisionIdAsync`, `GetByActorAsync`, `GetByImageDigestAsync`, `ListRecentAsync`, `ListByTimeRangeAsync`, `CountByActorSinceAsync`. + +**What breaks if dropped:** Governance audit UI, exception audit trail, gate bypass forensics (security-critical: who overrode a blocked image?). + +**Dual-write path:** Gate bypass audit is domain-specific and has unique query patterns (by image digest, by decision ID, by actor count since a time). These queries cannot be efficiently served from the generic unified audit store without custom indexes. **Recommendation**: Keep `policy.gate_bypass_audit` as a domain table (it is evidence, not just audit), but dual-write all entries to Timeline for cross-service visibility. Generic `policy.audit` can be fully migrated to Timeline. + +**Migration:** Phase 1: Add `AuditActionFilter` to all policy engine endpoints. Phase 2: Add Timeline emission in `PolicyAuditRepository.CreateAsync()`. Phase 3: Redirect generic audit reads to Timeline, keep bypass audit reads local. Phase 4: Drop `policy.audit` table. Retain `policy.gate_bypass_audit` permanently (reclassify as domain evidence, not audit). + +### 2.3 Notify -- `notify.audit` + +**Writes:** +- `NotifyAuditRepository` writes audit entries for template changes, rule changes, and incident acknowledgements. +- Direct calls from endpoint handlers: `TemplateEndpoints`, `RuleEndpoints`, `NotifyApiEndpoints`, `IncidentEndpoints`. + +**Reads:** +- `GET /api/v1/notify/audit` (in `Program.cs` line 1329) -- lists audit entries with limit/offset. + +**What breaks if dropped:** Notify audit endpoint returns empty or 404. + +**Dual-write path:** Notify audit is straightforward CRUD audit (who changed which template/rule). Fully replaceable by `AuditActionFilter` emission. The local `NotifyAuditRepository` writes can be preserved as dual-write during transition. + +**Migration:** Phase 1: Add `AuditActionFilter` to all notify endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `NotifyAuditRepository.CreateAsync()` for dual-write. Phase 3: Point `/api/v1/notify/audit` reads to Timeline (proxy or redirect). Phase 4: Drop `notify.audit` table. + +### 2.4 Scheduler -- `scheduler.audit` (monthly partitioned) + +**Writes:** +- `ISchedulerAuditService` interface writes audit entries when schedules are created/updated/deleted. +- Called from `ScheduleEndpoints` and `RunEndpoints`. + +**Reads:** +- Per-schedule and per-run audit queries via `ISchedulerAuditService`. +- No dedicated public audit endpoint found (consumed internally). + +**What breaks if dropped:** Internal schedule change audit trail lost. + +**Dual-write path:** Scheduler audit is straightforward. The monthly partitioning is its most advanced feature (enables efficient retention via `DROP PARTITION`). The unified Timeline store should adopt partitioning too (noted in AUDIT-004 risks). For now, dual-write is safe. + +**Migration:** Phase 1: Add `AuditActionFilter` to scheduler endpoints. Phase 2: Dual-write via `IAuditEventEmitter.EmitAsync()` in `ISchedulerAuditService` implementation. Phase 3: Drop `scheduler.audit` partitions after Timeline verification. Phase 4: Remove partition maintenance background service. + +### 2.5 Attestor -- `proofchain.audit_log` + +**Writes:** +- EF Core entity `AuditLogEntity` mapped to `proofchain.audit_log`. Records operations (create/verify/revoke) on proof chain entities. + +**Reads:** +- Internal only (no public audit endpoint found). + +**What breaks if dropped:** Proof chain operation audit trail lost. However, the proof chain itself provides cryptographic evidence of operations. + +**Dual-write path:** Attestor audit is simple operation logging. Fully replaceable by `AuditActionFilter` if Attestor endpoints are wired. + +**Note:** Attestor is NOT in the 9 services that currently call `AddAuditEmission()`. It needs to be wired first. + +**Migration:** Phase 1: Wire `AddAuditEmission()` in Attestor `Program.cs` + add `AuditActionFilter`. Phase 2: Dual-write via emitter in audit log write path. Phase 3: Drop `proofchain.audit_log` table. + +### 2.6 JobEngine/ReleaseOrchestrator -- `audit_entries` + `audit_sequences` (hash chain) + +**Writes:** +- `PostgresAuditRepository.AppendAsync()` in both JobEngine and ReleaseOrchestrator. Uses raw SQL with transactional hash chaining: get sequence -> compute hash -> insert entry -> update sequence hash. +- `CanonicalJsonHasher` for deterministic content hashing. +- Called from service layers when releases, deployments, approvals, etc. are modified. + +**Reads (ReleaseOrchestrator):** +- `GET /api/v1/release-orchestrator/audit` -- list, get by ID, resource history, sequence range, latest, summary, verify chain. +- Full REST API with cursor pagination, event type filtering, resource filtering, time range, actor filtering. +- Chain verification endpoint (`VerifyAuditChain`) for tamper-evidence. + +**Reads (JobEngine):** +- `PostgresAuditRepository.ListAsync()`, `.GetByIdAsync()`, `.GetByResourceAsync()`, `.GetBySequenceRangeAsync()`, `.GetLatestAsync()`, `.GetCountAsync()`, `.VerifyChainAsync()`, `.GetSummaryAsync()`. +- PacksRegistry: `IAuditRepository` used by `PackService`, `AttestationService`, `LifecycleService`, `ParityService`, `MirrorService`, `ExportService`. + +**What breaks if dropped:** The most mature audit implementation in the system. REST API endpoints return 404/500. Chain verification capability lost. PacksRegistry audit trail lost. + +**Dual-write path:** This is the most complex case because: +1. The local hash chain provides per-service tamper evidence. +2. The Timeline unified store has its OWN hash chain (separate sequence). +3. Both chains serve different purposes: local chain proves service-level integrity; unified chain proves cross-service integrity. + +**Recommendation:** Keep the ReleaseOrchestrator/JobEngine hash chain as the **service-level evidence chain** (reclassify as domain evidence, like the Policy gate bypass audit). Dual-write all entries to Timeline for the unified cross-service view. Eventually redirect LIST/SEARCH reads to Timeline but preserve the local chain verification endpoint. + +**Migration:** Phase 1: Add `AuditActionFilter` to all release-orchestrator and scheduler endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `PostgresAuditRepository.AppendAsync()` for dual-write. Phase 3: Redirect list/search/summary reads to Timeline (keep chain verify local). Phase 4: Evaluate whether local chain can be removed after 180-day parallel run. Phase 5: If chain integrity data is replicated in Timeline's own chain, drop local tables. + +--- + +## Delivery Tracker + +### FILTER-001 - Convention helper: `AuditedRouteGroupExtensions` +Status: TODO +Dependency: none +Owners: Developer (backend) +Task description: +- Create a small extension method in `StellaOps.Audit.Emission` that applies the filter at the group level: + ```csharp + public static RouteGroupBuilder WithAuditFilter(this RouteGroupBuilder group) + { + group.AddEndpointFilter(); + return group; + } + ``` +- This reduces per-file boilerplate: each endpoint file calls `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints, while the group registers the filter once. +- Also create a convenience extension for the common case: + ```csharp + public static RouteHandlerBuilder Audited(this RouteHandlerBuilder builder, string module, string action, string? resourceType = null) + { + return builder + .AddEndpointFilter() + .WithMetadata(new AuditActionAttribute(module, action) { ResourceType = resourceType }); + } + ``` +- The group-level approach is preferred for services with a single root group. The per-endpoint `.Audited()` method is a fallback for services with multiple independent groups. + +Completion criteria: +- [x] Extension methods added to `StellaOps.Audit.Emission` +- [x] Unit test for `Audited()` extension verifying metadata is applied +- [x] Builds with no errors + +**Effort: 0.5 day** + +### FILTER-002 - Batch 1: Annotate simple services (Integrations, EvidenceLocker) +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- **Integrations** (6 endpoints, 1 file): Add `.WithAuditFilter()` on the group. Add `.WithMetadata(new AuditActionAttribute("integrations", ""))` on each of the 6 state-changing endpoints: create, update, delete, test, discover, run_code_guard. +- **EvidenceLocker** (7 endpoints, 3 files): Add filter to endpoint groups. Annotate: store, snapshot, verify, hold, store_verdict, verify_verdict, export. +- Test: start services, trigger each endpoint, verify events appear in Timeline `/api/v1/audit/events?modules=integrations,evidence`. + +Completion criteria: +- [ ] All 13 endpoints annotated +- [ ] Events visible in Timeline for both modules +- [ ] No startup regressions + +**Effort: 1 day** + +### FILTER-003 - Batch 1 continued: Annotate Scanner +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- Scanner has ~65 state-changing endpoints across 30 files. +- Add `.WithAuditFilter()` on the top-level `MapGroup` in each endpoint registration extension method. +- Annotate each POST/PUT/PATCH/DELETE with `AuditActionAttribute("scanner", "")`. +- Action naming convention: use verb form matching the endpoint purpose (create, update, delete, submit, trigger, compute, verify, import, export, review, replay, etc.). +- Resource type overrides: use explicit `ResourceType` for non-obvious resources (e.g., `ResourceType = "scan_policy"` for scan policy CRUD, `ResourceType = "source"` for sources CRUD). +- Focus on CRUD and business operations; skip purely computational/query-like POSTs where the endpoint is idempotent and read-only (e.g., `/compare`, `/query`, `/current` batch). + +**Endpoints to SKIP** (read-only POST patterns, no state change): +- `DeltaCompareEndpoints.HandleCompareAsync` (computation) +- `CounterfactualEndpoints.HandleComputeAsync` (computation) +- `EpssEndpoints.GetCurrentBatch` (batch read) +- `SliceEndpoints.HandleQueryAsync` (query) +- `ScoreReplayEndpoints` (replay verification, read-only) +- `PolicyEndpoints` diagnostics/preview/runtime/overlay/linksets (read-only analysis) + +**Endpoints to ANNOTATE** (~50 after filtering): +- Sources CRUD + lifecycle operations +- Scan/SBOM submission +- Scan policy CRUD +- Approvals create/revoke +- Triage status updates, VEX submissions +- Secret detection settings CRUD +- SmartDiff VEX candidate reviews +- Webhooks (state-changing: trigger scans) +- Reports, offline kit import, call graph submit, witness verify +- Runtime events/reconcile, reachability compute + +Completion criteria: +- [ ] ~50 endpoints annotated (with documented skip list) +- [ ] Events visible in Timeline for module=scanner +- [ ] No startup regressions + +**Effort: 2 days** + +### FILTER-004 - Batch 2: Annotate Platform +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- Platform has ~107 state-changing endpoints across 23 files. +- Apply group-level filter on each endpoint group. +- Annotate with `AuditActionAttribute("platform", "")`. +- Use descriptive resource types: `identity_provider`, `trust_key`, `trust_issuer`, `trust_cert`, `script`, `environment`, `freeze_window`, `target`, `release_bundle`, `function_map`, `setup_session`, `localization`, `crypto_preference`, `environment_setting`, etc. +- Skip read-only POSTs: score evaluate/verify (computational), AoC compatibility verify/validate (read-only checks), notify/signals/quota compatibility stubs that are proxied responses. +- Pay special attention to `SetupEndpoints` (wizard steps) -- these are high-value audit targets (initial system configuration). + +Completion criteria: +- [ ] ~90 endpoints annotated (with documented skip list) +- [ ] Events visible in Timeline for module=platform +- [ ] No startup regressions + +**Effort: 2.5 days** + +### FILTER-005 - Batch 2 continued: Annotate Authority +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- Authority has ~49 state-changing endpoints across 10 files plus Program.cs inline endpoints. +- **Special consideration**: Authority runs its own auth middleware, not the standard gateway-propagated identity. The `AuditActionFilter` must correctly extract actor from Authority's own `ClaimsPrincipal`. +- Apply filter to admin group, console group, bootstrap group, and issuer directory groups. +- Action mapping for admin operations: tenant (create, update, suspend, resume), user (create, update, enable, disable), role (create, update, preview_impact), client (create, update, rotate), token (revoke). +- Action mapping for bootstrap: bootstrap_user, bootstrap_client, bootstrap_invite, revoke_service_account, rotate_signing, rotate_notifications, reload_plugins. +- Action mapping for issuer directory: create_issuer, update_issuer, delete_issuer, create_key, rotate_key, revoke_key, set_trust, delete_trust. +- Skip: OpenIddict protocol endpoints (token, introspect, revoke) -- these are auth protocol operations already captured by `AuthorityAuditSink`, not admin mutations. Authorize endpoint similarly. +- Skip: Notify ack-token endpoints, vuln workflow anti-forgery endpoints (internal crypto operations, not user-facing mutations). + +Completion criteria: +- [ ] ~35 admin/bootstrap/issuer endpoints annotated +- [ ] Events visible in Timeline for module=authority +- [ ] AuthorityAuditSink continues to work independently (no interference) + +**Effort: 2 days** + +### FILTER-006 - Batch 2 continued: Annotate Notify +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- Notify has ~65 state-changing endpoints across 15 files. +- Group-level filter on each endpoint group. +- Module name: "notify". +- Action mapping: rules (create, update, delete), templates (create, update, delete, preview, validate), incidents (acknowledge, resolve), escalation (create/update/delete policy, create/update/delete schedule, create/delete override, start, escalate, stop), quiet_hours, throttle, storm, fallback, localization, security, operator_override, simulation, observability. +- Skip: `POST /tokens/sign`, `POST /tokens/verify`, `POST /html/sanitize`, `POST /html/validate`, `POST /html/strip` -- these are utility/computation endpoints that do not mutate state. +- Focus on: CRUD operations, incident lifecycle, escalation lifecycle, dead letter management, chaos experiments, retention policies. + +Completion criteria: +- [ ] ~50 endpoints annotated (with documented skip list) +- [ ] Events visible in Timeline for module=notify +- [ ] No conflict with existing `NotifyAuditRepository` writes + +**Effort: 2 days** + +### FILTER-007 - Batch 2 continued: Annotate Policy Engine + Gateway +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- Policy Engine has ~162 state-changing endpoints across 57 files. Policy Gateway duplicates ~56 of these. +- **Strategy**: Annotate Engine endpoints. For Gateway, apply the same attributes to the matching Gateway endpoint files. +- Module name: "policy". +- The Gateway files under `src/Policy/StellaOps.Policy.Gateway/Endpoints/` mirror the Engine's `src/Policy/StellaOps.Policy.Engine/Endpoints/Gateway/` directory. Both need annotation. +- High-priority groups (security-critical): + 1. Gate endpoints (evaluate, force-pass) -- action: evaluate_gate, force_pass_gate + 2. Exception approvals (approve, reject, escalate, delegate) -- action: approve_exception, reject_exception, escalate_exception, delegate_exception + 3. Governance CRUD -- action: create_governance, update_governance, delete_governance + 4. Sealed mode (enable, disable, emergency) -- action: enable_sealed, disable_sealed, emergency_unseal + 5. Override CRUD -- action: create_override, expire_override +- Lower-priority (operational): + 6. Simulation endpoints (create, cancel, retry, preview) + 7. Risk profile/budget CRUD + 8. Verification policy CRUD + 9. Snapshot create/restore + 10. Compilation, lint, attestation reports +- Skip: Batch evaluation, policy decision, score gate evaluate (read-only evaluations that return computed results without mutating state). + +Completion criteria: +- [ ] ~130 endpoints annotated across Engine and Gateway (with documented skip list) +- [ ] Events visible in Timeline for module=policy +- [ ] No conflict with existing `PolicyAuditRepository` writes + +**Effort: 4 days** + +### FILTER-008 - Batch 2 continued: Annotate Release-Orchestrator + Scheduler +Status: TODO +Dependency: FILTER-001 +Owners: Developer (backend) +Task description: +- **Release-Orchestrator** (~40 endpoints, 9 files): Module "release-orchestrator". High-value actions: create_release, promote, deploy, rollback, approve, reject. Skip: legacy stubs (`JobEngineLegacyEndpoints` returning 501). +- **Scheduler** (~31 endpoints, 9 files): Module "scheduler". Actions: create_schedule, update_schedule, delete_schedule, pause, resume, create_run, cancel_run, retry_run, trigger_workflow. +- PacksRegistry (part of Scheduler service): Module "packs-registry". Actions: upload_pack, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror. + +Completion criteria: +- [ ] All ~71 endpoints annotated +- [ ] Events visible in Timeline for modules: release-orchestrator, scheduler, packs-registry +- [ ] No conflict with existing `PostgresAuditRepository` hash chain writes + +**Effort: 2 days** + +### DEPRECATE-001 - Batch 3: Dual-write for services with local audit tables +Status: TODO +Dependency: FILTER-002 through FILTER-008 (at least the relevant service batch) +Owners: Developer (backend) +Task description: +- For each service with an existing local audit table, add a secondary write path that emits to Timeline via `IAuditEventEmitter.EmitAsync()` inside the existing audit repository write methods: + 1. **Authority**: Add `IAuditEventEmitter.EmitAsync()` in `AuthorityAuditSink.WriteAsync()` to emit auth events (login, token grant, lockout) to Timeline. Map `AuthEventRecord` to `AuditEventPayload`. + 2. **Policy**: Add emission in `PolicyAuditRepository.CreateAsync()` and in `GateBypassAuditor` to emit bypass decisions to Timeline. + 3. **Notify**: Add emission in `NotifyAuditRepository` create method. + 4. **Scheduler**: Add emission in `ISchedulerAuditService` implementation. + 5. **JobEngine/ReleaseOrchestrator**: Add emission in `PostgresAuditRepository.AppendAsync()`. Map `AuditEntry` fields to `AuditEventPayload`. + 6. **Attestor**: Wire `AddAuditEmission()` in Program.cs (not yet wired). Add emission alongside `AuditLogEntity` inserts. +- All emissions must be fire-and-forget (matching existing `AuditActionFilter` pattern) -- failure to emit to Timeline must never break the local write. +- Add a log warning when emission fails (already built into `HttpAuditEventEmitter`). + +Completion criteria: +- [ ] Dual-write verified for all 6 services (events appear in both local table and Timeline) +- [ ] Local audit write latency unchanged (emission is async/fire-and-forget) +- [ ] No data loss: local table remains the authoritative source during this phase + +**Effort: 3 days** + +### DEPRECATE-002 - Batch 4: Redirect reads to Timeline unified sink +Status: TODO +Dependency: DEPRECATE-001, 30-day dual-write verification period +Owners: Developer (backend) +Task description: +- After 30 days of verified dual-write with zero data discrepancies: + 1. **Authority**: Update `ConsoleAdminEndpointExtensions.ListAuditEvents()` to query Timeline `/api/v1/audit/events?modules=authority` instead of local `authority.audit` table. Add `Obsolete` attribute and deprecation response headers to the local audit endpoint. + 2. **Policy**: Update governance audit endpoints to query Timeline. Keep gate bypass audit endpoints reading from local `policy.gate_bypass_audit` (domain evidence, not generic audit). + 3. **Notify**: Update `/api/v1/notify/audit` to proxy to Timeline. + 4. **Scheduler**: Internal audit reads redirected to Timeline. + 5. **ReleaseOrchestrator**: Update `/api/v1/release-orchestrator/audit` LIST/SEARCH/SUMMARY endpoints to query Timeline. **Keep chain verification endpoint reading from local table** (service-level chain integrity is different from unified chain). + 6. **Attestor**: Internal audit reads redirected to Timeline. +- Update `HttpUnifiedAuditEventProvider` to stop polling deprecated service-specific audit endpoints. +- Add deprecation headers: `Sunset: `, `Deprecation: true`, `Link: ; rel="successor-version"`. + +Completion criteria: +- [ ] All service-specific audit read endpoints return deprecation headers +- [ ] Timeline is the primary read source for all generic audit queries +- [ ] UI `AuditLogClient` uses unified endpoint exclusively (no fallback to per-service) +- [ ] Per-service audit endpoints still functional (backward compatibility for 90 days) + +**Effort: 3 days (implementation) + 30-day verification wait** + +### DEPRECATE-003 - Batch 5: Drop deprecated local audit tables +Status: TODO +Dependency: DEPRECATE-002, 90-day backward-compatibility period +Owners: Developer (backend) +Task description: +- After 90 days with no clients reading from deprecated endpoints: + 1. Remove local audit write code from repositories (stop dual-write). + 2. Create SQL migrations to drop tables: + - `DROP TABLE IF EXISTS authority.audit CASCADE;` + - `DROP TABLE IF EXISTS authority.airgap_audit CASCADE;` + - `DROP TABLE IF EXISTS authority.offline_kit_audit CASCADE;` + - `DROP TABLE IF EXISTS policy.audit CASCADE;` (keep `policy.gate_bypass_audit`) + - `DROP TABLE IF EXISTS notify.audit CASCADE;` + - `DROP TABLE IF EXISTS scheduler.audit CASCADE;` (drop all partitions) + - `DROP TABLE IF EXISTS proofchain.audit_log CASCADE;` + 3. **Do NOT drop** `audit_entries` / `audit_sequences` in JobEngine/ReleaseOrchestrator yet -- the hash chain is service-level evidence. Reclassify as domain tables, not audit tables. Evaluate for removal in a future sprint after 180-day parallel chain verification between local and Timeline chains. + 4. Remove deprecated audit endpoint registrations. + 5. Remove `PolicyAuditRepository`, `NotifyAuditRepository`, `AuthorityAuditSink` local DB write paths (keep structured logging). + 6. Remove `HttpUnifiedAuditEventProvider` polling entirely (all data flows through emission now). + +Completion criteria: +- [ ] Local audit tables dropped (except JobEngine/ReleaseOrchestrator chain tables and Policy gate bypass) +- [ ] No 500 errors from missing tables +- [ ] Timeline is the sole audit data store +- [ ] All audit read endpoints serve data from Timeline +- [ ] Deprecated code removed, no dead references + +**Effort: 2 days (implementation) + 90-day wait from DEPRECATE-002** + +--- + +## Effort Summary + +| Batch | Tasks | Effort | Timeline | +|---|---|---|---| +| **Batch 1**: Convention helper + simple services (Integrations, EvidenceLocker, Scanner) | FILTER-001, FILTER-002, FILTER-003 | 3.5 days | Week 1 | +| **Batch 2**: Complex services (Platform, Authority, Notify, Policy, ReleaseOrchestrator, Scheduler) | FILTER-004 through FILTER-008 | 12.5 days | Weeks 2-4 | +| **Batch 3**: Dual-write transition | DEPRECATE-001 | 3 days | Week 4-5 | +| **Batch 4**: Read migration (after 30-day verification) | DEPRECATE-002 | 3 days + 30-day wait | Week 9-10 | +| **Batch 5**: Drop local tables (after 90-day backward-compat) | DEPRECATE-003 | 2 days + 90-day wait | Week 22-23 | +| **TOTAL** | | **24 days active work** + **120 days verification** | ~6 months end-to-end | + +--- + +## Execution Log +| Date (UTC) | Update | Owner | +| --- | --- | --- | +| 2026-04-08 | Sprint created. Full endpoint inventory completed across all 9 wired services (~532 state-changing endpoints). Per-service audit table analysis completed for 6 services with local tables. | Planning | + +## Decisions & Risks + +### Decisions + +1. **Group-level filter + per-endpoint metadata is the convention.** `AuditActionFilter` is a no-op without `AuditActionAttribute`, so applying it at the group level is safe and reduces boilerplate from 2 lines per endpoint to 1 line. + +2. **Policy `gate_bypass_audit` and JobEngine/ReleaseOrchestrator `audit_entries` are reclassified as domain evidence tables, not audit.** Their query patterns (by image digest, by decision ID, by chain sequence) and integrity guarantees (hash chains, attestation digests) serve domain-specific needs that the generic unified store cannot efficiently replace. They should persist permanently alongside the unified audit sink. + +3. **Read-only POST endpoints are excluded from audit annotation.** Endpoints like `/compare`, `/query`, `/evaluate` (when they compute a result without persisting state) do not produce meaningful audit events. Annotating them would create noise in the audit log. + +4. **Authority auth-protocol events require separate emission.** The `AuthorityAuditSink` captures login attempts, token grants, and lockouts -- events that are NOT HTTP endpoint mutations. These must be emitted to Timeline via a direct `IAuditEventEmitter.EmitAsync()` call, not via `AuditActionFilter`. + +5. **120-day verification pipeline.** Dual-write runs for 30 days before reads are redirected. Deprecated endpoints remain functional for 90 more days. Total 120 days from dual-write start to table drop. This is non-negotiable for a compliance-critical audit subsystem. + +### Risks + +1. **~532 endpoints is a large surface.** Risk of missed annotations or incorrect module/action strings. Mitigation: create an integration test that walks all registered endpoints and asserts that every non-GET endpoint has `AuditActionAttribute` metadata (or is in an explicit skip list). + +2. **Policy Engine/Gateway duplication.** The same endpoint logic exists in two places. Risk of annotation drift. Mitigation: consider extracting shared endpoint registration into a common library, or generating Gateway endpoints from Engine definitions. + +3. **Fire-and-forget emission can silently drop events.** If Timeline is down during the 30-day dual-write period, the local table has events that Timeline does not. Mitigation: add a reconciliation job that compares local table event counts with Timeline for the same module/time range and alerts on discrepancies. + +4. **Performance impact of 532 additional HTTP calls.** Each annotated endpoint now makes a fire-and-forget HTTP POST to Timeline. Under high load, this could create back-pressure. Mitigation: `HttpAuditEventEmitter` already uses `IHttpClientFactory` with connection pooling. Add circuit-breaker via Polly if needed. The emission is async and never blocks the response. + +5. **Existing Scheduler monthly partitioning is lost in Timeline.** The unified store does not partition by month. Retention will rely on `DELETE WHERE timestamp < cutoff` instead of `DROP PARTITION`. Mitigation: AUDIT-004 (from parent sprint) should add partitioning to the unified audit table. + +## Next Checkpoints + +- **Week 1**: Convention helper shipped, Integrations + EvidenceLocker + Scanner annotated +- **Week 2-4**: All remaining services annotated +- **Week 4-5**: Dual-write enabled, monitoring dashboard created +- **Week 9-10**: Read migration after 30-day verification +- **Week 22-23**: Table drop after 90-day backward-compat window