docs(audit): sprint plan for endpoint filters + per-service table deprecation
- Map 532 state-changing endpoints across 9 services for AuditActionFilter - Plan 5-batch migration: convention helper → complex services → dual-write → read migration → drop local tables - Reclassify Authority auth-protocol and Policy gate-bypass audit as domain evidence - 24 days active work + 120-day verification pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,677 @@
|
||||
# Sprint 20260408-005 -- AuditActionFilter Endpoint Wiring & Per-Service Audit Table Deprecation
|
||||
|
||||
## Topic & Scope
|
||||
|
||||
- **Wire `AuditActionFilter` across all 9 services** that already call `AddAuditEmission()` in their `Program.cs`, annotating every state-changing endpoint with `AuditActionAttribute` so that every POST/PUT/PATCH/DELETE emits a structured audit event to the Timeline unified sink.
|
||||
- **Deprecate per-service audit tables** in Authority, Policy, Notify, Scheduler, Attestor, and JobEngine/ReleaseOrchestrator through a phased dual-write -> read-migration -> drop pipeline.
|
||||
- This sprint implements AUDIT-002 and AUDIT-005 from `SPRINT_20260408_004_Timeline_unified_audit_sink.md`.
|
||||
- Working directory: `src/__Libraries/StellaOps.Audit.Emission/`, cross-module endpoint files, per-service persistence directories.
|
||||
- Expected evidence: all state-changing endpoints decorated, audit events visible in Timeline `/api/v1/audit/events`, dual-write verified, deprecation headers on legacy endpoints, zero data loss.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
|
||||
- **Upstream**: AUDIT-001 (DONE) -- PostgreSQL persistence for Timeline audit ingest is complete. `PostgresUnifiedAuditEventStore` with SHA-256 hash chain is operational.
|
||||
- **Upstream**: `AddAuditEmission()` is already called in 9 services: Authority, Policy, Release-Orchestrator, EvidenceLocker, Notify, Scanner, Scheduler, Integrations, Platform. No DI wiring needed.
|
||||
- Batches 1-2 (filter annotation) can run in parallel across services.
|
||||
- Batch 3 (dual-write) can begin once Batch 1-2 is verified for a given service.
|
||||
- Batches 4-5 (read migration, table drop) are sequential and must wait for verification periods.
|
||||
|
||||
## Documentation Prerequisites
|
||||
|
||||
- `src/__Libraries/StellaOps.Audit.Emission/AuditActionFilter.cs` -- filter behavior, no-op when attribute missing.
|
||||
- `src/__Libraries/StellaOps.Audit.Emission/AuditActionAttribute.cs` -- module/action/resourceType parameters.
|
||||
- `docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md` -- parent sprint context.
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Endpoint Filter Annotation Plan
|
||||
|
||||
### Convention Mode Assessment
|
||||
|
||||
**The `AuditActionFilter` supports a passive convention mode.** Reading the filter source:
|
||||
- If `AuditActionAttribute` metadata is NOT present on the endpoint, the filter is a **no-op passthrough** (line 48: returns `result` unchanged).
|
||||
- The filter can be added at the **RouteGroup level** (ASP.NET Core supports `group.AddEndpointFilter<T>()`), which applies it to all endpoints in the group.
|
||||
- Only endpoints explicitly annotated with `.WithMetadata(new AuditActionAttribute("module", "action"))` will emit events.
|
||||
|
||||
**Recommended approach: hybrid group + per-endpoint annotation.**
|
||||
1. Add `group.AddEndpointFilter<AuditActionFilter>()` once at each service's main API route group.
|
||||
2. Add `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints.
|
||||
3. GET endpoints remain unannotated and the filter passes through silently.
|
||||
|
||||
This minimizes the per-endpoint boilerplate (no `.AddEndpointFilter<AuditActionFilter>()` on each endpoint) while keeping explicit control over which actions are audited.
|
||||
|
||||
### Per-Service Endpoint Inventory
|
||||
|
||||
#### 1. Scanner (module: "scanner") -- 30 endpoint files, ~65 state-changing endpoints
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Sources CRUD | 8 | create, update, delete, test, pause, resume, activate, trigger_scan |
|
||||
| Scan submission | 2 | submit, attach_entropy |
|
||||
| SBOM submission/upload | 2 | submit_sbom, upload |
|
||||
| Scan policy CRUD | 3 | create, update, delete |
|
||||
| Approvals | 2 | create, revoke |
|
||||
| Triage (status, VEX, batch, proof) | 5 | update_status, submit_vex, batch_action, generate_proof, bulk_query |
|
||||
| Webhooks (generic + provider-specific) | 5 | receive_webhook |
|
||||
| Reports | 1 | create |
|
||||
| Reachability (compute, analyze, VEX) | 3 | compute, analyze, generate_vex |
|
||||
| Secret detection settings | 5 | create, update, delete (settings + exceptions) |
|
||||
| SmartDiff/VEX candidates review | 2 | review |
|
||||
| Score replay/verify | 4 | replay, verify |
|
||||
| Validation/fidelity | 3 | validate, analyze, upgrade |
|
||||
| Offline kit | 2 | import, validate |
|
||||
| Call graph | 1 | submit |
|
||||
| Witness verify | 1 | verify |
|
||||
| Runtime events | 2 | events, reconcile |
|
||||
| Other (delta compare, EPSS batch, counterfactual, slice, replay attach, GitHub SARIF, policy diagnostics/preview/runtime/overlay/linksets, composition verify) | ~14 | various |
|
||||
|
||||
#### 2. Integrations (module: "integrations") -- 1 endpoint file, 6 state-changing endpoints
|
||||
|
||||
| Endpoint | Action |
|
||||
|---|---|
|
||||
| `POST /` | create |
|
||||
| `PUT /{id}` | update |
|
||||
| `DELETE /{id}` | delete |
|
||||
| `POST /{id}/test` | test |
|
||||
| `POST /{id}/discover` | discover |
|
||||
| `POST /ai-code-guard/run` | run_code_guard |
|
||||
|
||||
#### 3. Platform (module: "platform") -- 23 endpoint files, ~107 state-changing endpoints
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Setup wizard sessions/steps | 14 | create_session, resume, execute_step, skip_step, run_checks, prerequisites, update_config, finalize |
|
||||
| Trust signing (keys, issuers, certs, transparency log) | 10 | create_key, rotate_key, revoke_key, create_issuer, block_issuer, unblock_issuer, create_cert, revoke_cert, update_transparency_log |
|
||||
| Identity providers | 7 | create, update, delete, enable, disable, test, apply |
|
||||
| Environment settings admin | 2 | update, delete |
|
||||
| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility |
|
||||
| Release control (bundles, versions, materialize) | 3 | create_bundle, create_version, materialize |
|
||||
| Release orchestrator environments (env CRUD, targets, freeze windows) | 12 | create, update, delete (env/target/freeze_window), update_settings, health_check |
|
||||
| Function maps | 3 | create, delete, verify |
|
||||
| Localization | 2 | update_bundles, delete_string |
|
||||
| Crypto provider admin | 2 | update_preferences, delete_preferences |
|
||||
| Context | 1 | update_preferences |
|
||||
| Assistant (user state, tips, tours, glossary) | 5 | update_user_state, create_tip, delete_tip, create_tour, create_glossary |
|
||||
| Federation telemetry | 3 | grant_consent, revoke_consent, trigger |
|
||||
| Notify compatibility | 13 | create/delete (schedules, quiet_hours, throttle, escalation, localizations), simulate, ack_incident |
|
||||
| Signals compatibility | 5 | create_trigger, update_trigger, delete_trigger, toggle_trigger, retry |
|
||||
| Evidence threads | 3 | export, transcript, collect |
|
||||
| Score | 2 | evaluate, verify |
|
||||
| Policy interop | 4 | export, import, validate, evaluate |
|
||||
| Quota/AoC compatibility, onboarding, profiles, seed | ~10 | various |
|
||||
| Migration admin | 1 | run |
|
||||
|
||||
#### 4. Authority (module: "authority") -- 10 endpoint files, ~49 state-changing endpoints
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Tenant CRUD + suspend/resume | 4 | create, update, suspend, resume |
|
||||
| User CRUD + enable/disable | 4 | create, update, disable, enable |
|
||||
| Role CRUD + preview impact | 3 | create, update, preview_impact |
|
||||
| Client CRUD + rotate | 3 | create, update, rotate |
|
||||
| Token revoke | 1 | revoke |
|
||||
| Branding update/preview | 2 | update, preview |
|
||||
| Airgap audit record | 1 | record |
|
||||
| Bootstrap users/clients/invites/service-accounts/signing/notifications/plugins | 8 | bootstrap_create, rotate, reload |
|
||||
| OpenIddict (token, introspect, revoke) | 3 | issue_token, introspect, revoke_token |
|
||||
| Authorize | 1 | authorize |
|
||||
| IssuerDirectory (issuer CRUD, key CRUD, trust CRUD) | 8 | create, update, delete (issuer/key/trust) |
|
||||
| Notify ack-tokens + vuln workflow tokens + attachment tokens | 6 | rotate, issue, verify |
|
||||
| Vulnerability tickets + advisory AI logs | 2 | create_ticket, log_inference |
|
||||
| Console token introspect + vuln ticket | 2 | introspect, create_ticket |
|
||||
|
||||
#### 5. Policy (module: "policy") -- 57 endpoint files in Engine + 11 in Gateway, ~162+56 state-changing endpoints (many duplicated between Engine and Gateway)
|
||||
|
||||
**Note**: Policy Engine and Policy Gateway share nearly identical endpoint files (Gateway proxies to Engine). Annotation should target the Engine endpoints; Gateway endpoints should mirror the same attributes.
|
||||
|
||||
| Endpoint Group (Engine) | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Governance CRUD (policies, rules, thresholds) | 9 | create, update, delete, enable, disable, reorder, import, export, clone |
|
||||
| Policy simulation (create, cancel, retry, preview, compare, what-if, etc.) | 20 | create, cancel, retry, preview, compare, simulate |
|
||||
| Exception management (create, approve, reject, revoke, extend) | 6 | create, approve, reject, revoke, extend, batch |
|
||||
| Exception approvals (approve, reject, escalate, delegate) | 4 | approve, reject, escalate, delegate |
|
||||
| Gate operations (evaluate, force-pass) | 2 | evaluate, force_pass |
|
||||
| Gates CRUD | 2 | create, delete |
|
||||
| Score gate (evaluate, verify) | 2 | evaluate, verify |
|
||||
| Risk profile CRUD + air-gap sync | 9 | create, update, delete, sync_airgap, import, export |
|
||||
| Risk budget management | 3 | create, update, delete |
|
||||
| Risk simulation (run, preview, batch, sensitivity, compare, rebase, budget) | 7 | run, preview, batch, compare |
|
||||
| Policy pack CRUD | 5 | create, update, delete, activate, deactivate |
|
||||
| Policy pack bundles | 1 | create |
|
||||
| Override CRUD | 5 | create, update, delete, expire, batch |
|
||||
| Verification policy CRUD + editor | 6 | create, update, delete, compile, validate |
|
||||
| Scope attachment | 4 | attach, detach, reorder, bulk |
|
||||
| Snapshots (create, restore) | 2 | create, restore |
|
||||
| Violations (acknowledge, dismiss, reopen) | 5 | acknowledge, dismiss, reopen |
|
||||
| Staleness (configure, reset) | 2 | configure, reset |
|
||||
| Sealed mode (enable, disable, emergency) | 3 | enable, disable, emergency |
|
||||
| Profile events (create, ack) | 2 | create, acknowledge |
|
||||
| Conflict resolution | 3 | resolve, merge, override |
|
||||
| Policy decision, batch evaluation, policy compilation, lint | 4 | evaluate, batch, compile, lint |
|
||||
| Registry webhooks | 3 | register, update, delete |
|
||||
| Deltas | 2 | compute, compare |
|
||||
| Attestation reports + console | 6 | create, export, verify |
|
||||
| CVSS receipts | 2 | submit, verify |
|
||||
| Budget endpoints | 1 | allocate |
|
||||
| Determinization config | 2 | update, audit |
|
||||
| Other (tool lattice, advisory AI knobs, trust weighting, overlay sim, path scope sim, evidence summary, delta-if-present, air-gap notifications, profile export, console export, ledger export, orchestrator job, policy worker, console simulation, batch context, verify determinism, unknown tracking) | ~20 | various |
|
||||
|
||||
#### 6. Release-Orchestrator (module: "release-orchestrator") -- 9 endpoint files, ~40 state-changing endpoints (excluding legacy stubs)
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Release CRUD | 4 | create, update, delete, clone |
|
||||
| Release lifecycle (ready, promote, deploy, rollback) | 4 | mark_ready, promote, deploy, rollback |
|
||||
| Release components CRUD | 3 | add, update, remove |
|
||||
| Approvals (approve, reject, batch) | 4 | approve, reject, batch_approve, batch_reject |
|
||||
| Release dashboard (approve/reject promotion) | 2 | approve_promotion, reject_promotion |
|
||||
| Deployment operations (create, pause, resume, cancel, rollback, retry target) | 6 | create, pause, resume, cancel, rollback, retry |
|
||||
| Release control v2 (approval decision, rollback) | 2 | approval_decision, rollback |
|
||||
| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility |
|
||||
| Policy gate profiles CRUD + simulate | 9 | create, update, delete, set_default, validate, simulate, bundle_simulate, feed_freshness |
|
||||
| Evidence verify | 1 | verify |
|
||||
|
||||
**Note**: `JobEngineLegacyEndpoints` contains catch-all `{**rest}` stubs that return 501; these do NOT need audit annotation.
|
||||
|
||||
#### 7. EvidenceLocker (module: "evidence") -- 2 endpoint files + Program.cs, ~7 state-changing endpoints
|
||||
|
||||
| Endpoint | Action |
|
||||
|---|---|
|
||||
| `POST /evidence` | store |
|
||||
| `POST /evidence/snapshot` | snapshot |
|
||||
| `POST /evidence/verify` | verify |
|
||||
| `POST /evidence/hold/{caseId}` | hold |
|
||||
| `POST /verdicts/` | store_verdict |
|
||||
| `POST /verdicts/{id}/verify` | verify_verdict |
|
||||
| `POST /exports/{bundleId}/export` | export |
|
||||
|
||||
#### 8. Notify (module: "notify") -- 15 endpoint files, ~65 state-changing endpoints
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Rules CRUD (notify API + standalone) | 6 | create, update, delete |
|
||||
| Templates CRUD + preview + validate (notify API + standalone) | 8 | create, update, delete, preview, validate |
|
||||
| Incidents (ack, resolve) | 4 | acknowledge, resolve |
|
||||
| Escalation policies CRUD + schedules CRUD + overrides | 10 | create, update, delete |
|
||||
| Escalation operations (start, escalate, stop, ack, webhook) | 5 | start, escalate, stop, ack |
|
||||
| Quiet hours (calendars CRUD + evaluate) | 4 | create, update, delete, evaluate |
|
||||
| Throttle (config update/delete, evaluate) | 3 | update, delete, evaluate |
|
||||
| Storm breaker (summary, clear) | 2 | summary, clear |
|
||||
| Fallback chains + deliveries | 3 | update_chain, test, delete_delivery |
|
||||
| Localization (format string, update bundles, delete bundle, validate) | 4 | format, update_bundles, delete_bundle, validate |
|
||||
| Observability (dead letters retry/discard/purge, chaos, retention policies) | 8 | retry, discard, purge, start_experiment, stop_experiment, create_policy, update_policy, delete_policy |
|
||||
| Security (tokens, keys, webhooks, HTML, tenants, grants) | 12 | sign, verify, rotate, register_webhook, validate, sanitize, strip, validate_tenant, fuzz_test, grant, revoke |
|
||||
| Operator overrides (create, revoke, check) | 3 | create, revoke, check |
|
||||
| Simulation (simulate, validate rule) | 2 | simulate, validate |
|
||||
|
||||
#### 9. Scheduler (module: "scheduler") -- 9 endpoint files, ~31 state-changing endpoints
|
||||
|
||||
| Endpoint Group | Count | Action(s) |
|
||||
|---|---|---|
|
||||
| Schedules CRUD + pause/resume | 5 | create, update, delete, pause, resume |
|
||||
| Runs (create, cancel, retry, preview) | 4 | create, cancel, retry, preview |
|
||||
| Workflow trigger | 1 | trigger |
|
||||
| Graph jobs (build, overlay, complete hook) | 3 | build, overlay, complete |
|
||||
| Event webhooks (conselier export, excitor export) | 2 | export |
|
||||
| Policy runs | 1 | create |
|
||||
| Policy simulations (create, preview, cancel, retry) | 4 | create, preview, cancel, retry |
|
||||
| Resolver jobs | 1 | create |
|
||||
| PacksRegistry (upload, signature, attestation, lifecycle, parity, offline seed, mirrors, mirror sync) | 9 | upload, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror |
|
||||
|
||||
### Total Endpoint Count Summary
|
||||
|
||||
| Service | State-Changing Endpoints | Complexity |
|
||||
|---|---|---|
|
||||
| Scanner | ~65 | High (30 files) |
|
||||
| Integrations | 6 | Low (1 file) |
|
||||
| Platform | ~107 | High (23 files) |
|
||||
| Authority | ~49 | Medium (10 files, multiple sub-services) |
|
||||
| Policy | ~162 (Engine) | Very High (57 files, duplicated in Gateway) |
|
||||
| Release-Orchestrator | ~40 | Medium (9 files) |
|
||||
| EvidenceLocker | 7 | Low (3 files) |
|
||||
| Notify | ~65 | High (15 files) |
|
||||
| Scheduler | ~31 | Medium (9 files) |
|
||||
| **TOTAL** | **~532** | |
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Per-Service Audit Table Deprecation Plan
|
||||
|
||||
### 2.1 Authority -- `authority.audit`, `authority.airgap_audit`, `authority.offline_kit_audit`
|
||||
|
||||
**Writes:**
|
||||
- `AuthorityAuditSink` (implements `IAuthEventSink`) writes login/auth events via `IAuthorityLoginAttemptStore.InsertAsync()` -- this is a specialized auth event pipeline, NOT a generic endpoint audit filter.
|
||||
- `AirgapAuditEndpointExtensions` has `POST /authority/audit/airgap` that records airgap-specific audit entries.
|
||||
|
||||
**Reads:**
|
||||
- `GET /console/admin/audit` -- `ConsoleAdminEndpointExtensions.ListAuditEvents()` reads from the authority.audit table.
|
||||
- `GET /authority/audit/airgap` -- reads airgap audit entries.
|
||||
- `GET /authority/incident-audit` -- reads incident audit entries.
|
||||
- UI: Audit tab in Authority admin console.
|
||||
|
||||
**What breaks if dropped:** Admin audit log in the console loses historical auth event data. The specialized `ClassifiedString` PII classification would be lost.
|
||||
|
||||
**Dual-write path:** The `AuthorityAuditSink` pipeline is fundamentally different from `AuditActionFilter` (it captures auth protocol events like login success/failure, token issuance, not HTTP endpoint calls). **Both are needed**:
|
||||
- `AuditActionFilter` for admin mutations (user CRUD, role CRUD, client CRUD, tenant management).
|
||||
- `AuthorityAuditSink` for auth protocol events (login attempts, token grants, lockouts) -- should also emit to Timeline via `IAuditEventEmitter` directly.
|
||||
|
||||
**Migration:** Phase 1: Add `AuditActionFilter` to admin endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` call inside `AuthorityAuditSink.WriteAsync()` to dual-write auth events. Phase 3: Redirect admin audit reads to Timeline. Phase 4: Drop local tables after 90-day verification.
|
||||
|
||||
### 2.2 Policy -- `policy.audit` + `policy.gate_bypass_audit`
|
||||
|
||||
**Writes:**
|
||||
- `PolicyAuditRepository.CreateAsync()` writes generic policy audit entries.
|
||||
- `PostgresGateBypassAuditRepository.AddAsync()` writes gate bypass decisions (specialized: actor, decision override, justification, image digest, policy ID, attestation digest).
|
||||
- `GateBypassAuditor` service calls the bypass audit repository when a gate bypass occurs.
|
||||
|
||||
**Reads:**
|
||||
- `GET /api/v1/governance/audit/events` + `GET /api/v1/governance/audit/events/{eventId}` -- governance audit events.
|
||||
- `GET /api/v1/policy/exceptions/{requestId}/audit` -- exception approval trail.
|
||||
- `GET /api/v1/policy/determinization/audit` -- determinization config audit history.
|
||||
- `GET /api/v1/policy/simulation/.../audit` -- simulation audit.
|
||||
- `PolicyAuditRepository.ListAsync()`, `.GetByResourceAsync()`, `.GetByCorrelationIdAsync()`.
|
||||
- `PostgresGateBypassAuditRepository` reads: `GetByIdAsync`, `GetByDecisionIdAsync`, `GetByActorAsync`, `GetByImageDigestAsync`, `ListRecentAsync`, `ListByTimeRangeAsync`, `CountByActorSinceAsync`.
|
||||
|
||||
**What breaks if dropped:** Governance audit UI, exception audit trail, gate bypass forensics (security-critical: who overrode a blocked image?).
|
||||
|
||||
**Dual-write path:** Gate bypass audit is domain-specific and has unique query patterns (by image digest, by decision ID, by actor count since a time). These queries cannot be efficiently served from the generic unified audit store without custom indexes. **Recommendation**: Keep `policy.gate_bypass_audit` as a domain table (it is evidence, not just audit), but dual-write all entries to Timeline for cross-service visibility. Generic `policy.audit` can be fully migrated to Timeline.
|
||||
|
||||
**Migration:** Phase 1: Add `AuditActionFilter` to all policy engine endpoints. Phase 2: Add Timeline emission in `PolicyAuditRepository.CreateAsync()`. Phase 3: Redirect generic audit reads to Timeline, keep bypass audit reads local. Phase 4: Drop `policy.audit` table. Retain `policy.gate_bypass_audit` permanently (reclassify as domain evidence, not audit).
|
||||
|
||||
### 2.3 Notify -- `notify.audit`
|
||||
|
||||
**Writes:**
|
||||
- `NotifyAuditRepository` writes audit entries for template changes, rule changes, and incident acknowledgements.
|
||||
- Direct calls from endpoint handlers: `TemplateEndpoints`, `RuleEndpoints`, `NotifyApiEndpoints`, `IncidentEndpoints`.
|
||||
|
||||
**Reads:**
|
||||
- `GET /api/v1/notify/audit` (in `Program.cs` line 1329) -- lists audit entries with limit/offset.
|
||||
|
||||
**What breaks if dropped:** Notify audit endpoint returns empty or 404.
|
||||
|
||||
**Dual-write path:** Notify audit is straightforward CRUD audit (who changed which template/rule). Fully replaceable by `AuditActionFilter` emission. The local `NotifyAuditRepository` writes can be preserved as dual-write during transition.
|
||||
|
||||
**Migration:** Phase 1: Add `AuditActionFilter` to all notify endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `NotifyAuditRepository.CreateAsync()` for dual-write. Phase 3: Point `/api/v1/notify/audit` reads to Timeline (proxy or redirect). Phase 4: Drop `notify.audit` table.
|
||||
|
||||
### 2.4 Scheduler -- `scheduler.audit` (monthly partitioned)
|
||||
|
||||
**Writes:**
|
||||
- `ISchedulerAuditService` interface writes audit entries when schedules are created/updated/deleted.
|
||||
- Called from `ScheduleEndpoints` and `RunEndpoints`.
|
||||
|
||||
**Reads:**
|
||||
- Per-schedule and per-run audit queries via `ISchedulerAuditService`.
|
||||
- No dedicated public audit endpoint found (consumed internally).
|
||||
|
||||
**What breaks if dropped:** Internal schedule change audit trail lost.
|
||||
|
||||
**Dual-write path:** Scheduler audit is straightforward. The monthly partitioning is its most advanced feature (enables efficient retention via `DROP PARTITION`). The unified Timeline store should adopt partitioning too (noted in AUDIT-004 risks). For now, dual-write is safe.
|
||||
|
||||
**Migration:** Phase 1: Add `AuditActionFilter` to scheduler endpoints. Phase 2: Dual-write via `IAuditEventEmitter.EmitAsync()` in `ISchedulerAuditService` implementation. Phase 3: Drop `scheduler.audit` partitions after Timeline verification. Phase 4: Remove partition maintenance background service.
|
||||
|
||||
### 2.5 Attestor -- `proofchain.audit_log`
|
||||
|
||||
**Writes:**
|
||||
- EF Core entity `AuditLogEntity` mapped to `proofchain.audit_log`. Records operations (create/verify/revoke) on proof chain entities.
|
||||
|
||||
**Reads:**
|
||||
- Internal only (no public audit endpoint found).
|
||||
|
||||
**What breaks if dropped:** Proof chain operation audit trail lost. However, the proof chain itself provides cryptographic evidence of operations.
|
||||
|
||||
**Dual-write path:** Attestor audit is simple operation logging. Fully replaceable by `AuditActionFilter` if Attestor endpoints are wired.
|
||||
|
||||
**Note:** Attestor is NOT in the 9 services that currently call `AddAuditEmission()`. It needs to be wired first.
|
||||
|
||||
**Migration:** Phase 1: Wire `AddAuditEmission()` in Attestor `Program.cs` + add `AuditActionFilter`. Phase 2: Dual-write via emitter in audit log write path. Phase 3: Drop `proofchain.audit_log` table.
|
||||
|
||||
### 2.6 JobEngine/ReleaseOrchestrator -- `audit_entries` + `audit_sequences` (hash chain)
|
||||
|
||||
**Writes:**
|
||||
- `PostgresAuditRepository.AppendAsync()` in both JobEngine and ReleaseOrchestrator. Uses raw SQL with transactional hash chaining: get sequence -> compute hash -> insert entry -> update sequence hash.
|
||||
- `CanonicalJsonHasher` for deterministic content hashing.
|
||||
- Called from service layers when releases, deployments, approvals, etc. are modified.
|
||||
|
||||
**Reads (ReleaseOrchestrator):**
|
||||
- `GET /api/v1/release-orchestrator/audit` -- list, get by ID, resource history, sequence range, latest, summary, verify chain.
|
||||
- Full REST API with cursor pagination, event type filtering, resource filtering, time range, actor filtering.
|
||||
- Chain verification endpoint (`VerifyAuditChain`) for tamper-evidence.
|
||||
|
||||
**Reads (JobEngine):**
|
||||
- `PostgresAuditRepository.ListAsync()`, `.GetByIdAsync()`, `.GetByResourceAsync()`, `.GetBySequenceRangeAsync()`, `.GetLatestAsync()`, `.GetCountAsync()`, `.VerifyChainAsync()`, `.GetSummaryAsync()`.
|
||||
- PacksRegistry: `IAuditRepository` used by `PackService`, `AttestationService`, `LifecycleService`, `ParityService`, `MirrorService`, `ExportService`.
|
||||
|
||||
**What breaks if dropped:** The most mature audit implementation in the system. REST API endpoints return 404/500. Chain verification capability lost. PacksRegistry audit trail lost.
|
||||
|
||||
**Dual-write path:** This is the most complex case because:
|
||||
1. The local hash chain provides per-service tamper evidence.
|
||||
2. The Timeline unified store has its OWN hash chain (separate sequence).
|
||||
3. Both chains serve different purposes: local chain proves service-level integrity; unified chain proves cross-service integrity.
|
||||
|
||||
**Recommendation:** Keep the ReleaseOrchestrator/JobEngine hash chain as the **service-level evidence chain** (reclassify as domain evidence, like the Policy gate bypass audit). Dual-write all entries to Timeline for the unified cross-service view. Eventually redirect LIST/SEARCH reads to Timeline but preserve the local chain verification endpoint.
|
||||
|
||||
**Migration:** Phase 1: Add `AuditActionFilter` to all release-orchestrator and scheduler endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `PostgresAuditRepository.AppendAsync()` for dual-write. Phase 3: Redirect list/search/summary reads to Timeline (keep chain verify local). Phase 4: Evaluate whether local chain can be removed after 180-day parallel run. Phase 5: If chain integrity data is replicated in Timeline's own chain, drop local tables.
|
||||
|
||||
---
|
||||
|
||||
## Delivery Tracker
|
||||
|
||||
### FILTER-001 - Convention helper: `AuditedRouteGroupExtensions`
|
||||
Status: TODO
|
||||
Dependency: none
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Create a small extension method in `StellaOps.Audit.Emission` that applies the filter at the group level:
|
||||
```csharp
|
||||
public static RouteGroupBuilder WithAuditFilter(this RouteGroupBuilder group)
|
||||
{
|
||||
group.AddEndpointFilter<AuditActionFilter>();
|
||||
return group;
|
||||
}
|
||||
```
|
||||
- This reduces per-file boilerplate: each endpoint file calls `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints, while the group registers the filter once.
|
||||
- Also create a convenience extension for the common case:
|
||||
```csharp
|
||||
public static RouteHandlerBuilder Audited(this RouteHandlerBuilder builder, string module, string action, string? resourceType = null)
|
||||
{
|
||||
return builder
|
||||
.AddEndpointFilter<AuditActionFilter>()
|
||||
.WithMetadata(new AuditActionAttribute(module, action) { ResourceType = resourceType });
|
||||
}
|
||||
```
|
||||
- The group-level approach is preferred for services with a single root group. The per-endpoint `.Audited()` method is a fallback for services with multiple independent groups.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Extension methods added to `StellaOps.Audit.Emission`
|
||||
- [x] Unit test for `Audited()` extension verifying metadata is applied
|
||||
- [x] Builds with no errors
|
||||
|
||||
**Effort: 0.5 day**
|
||||
|
||||
### FILTER-002 - Batch 1: Annotate simple services (Integrations, EvidenceLocker)
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- **Integrations** (6 endpoints, 1 file): Add `.WithAuditFilter()` on the group. Add `.WithMetadata(new AuditActionAttribute("integrations", "<action>"))` on each of the 6 state-changing endpoints: create, update, delete, test, discover, run_code_guard.
|
||||
- **EvidenceLocker** (7 endpoints, 3 files): Add filter to endpoint groups. Annotate: store, snapshot, verify, hold, store_verdict, verify_verdict, export.
|
||||
- Test: start services, trigger each endpoint, verify events appear in Timeline `/api/v1/audit/events?modules=integrations,evidence`.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All 13 endpoints annotated
|
||||
- [ ] Events visible in Timeline for both modules
|
||||
- [ ] No startup regressions
|
||||
|
||||
**Effort: 1 day**
|
||||
|
||||
### FILTER-003 - Batch 1 continued: Annotate Scanner
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Scanner has ~65 state-changing endpoints across 30 files.
|
||||
- Add `.WithAuditFilter()` on the top-level `MapGroup` in each endpoint registration extension method.
|
||||
- Annotate each POST/PUT/PATCH/DELETE with `AuditActionAttribute("scanner", "<action>")`.
|
||||
- Action naming convention: use verb form matching the endpoint purpose (create, update, delete, submit, trigger, compute, verify, import, export, review, replay, etc.).
|
||||
- Resource type overrides: use explicit `ResourceType` for non-obvious resources (e.g., `ResourceType = "scan_policy"` for scan policy CRUD, `ResourceType = "source"` for sources CRUD).
|
||||
- Focus on CRUD and business operations; skip purely computational/query-like POSTs where the endpoint is idempotent and read-only (e.g., `/compare`, `/query`, `/current` batch).
|
||||
|
||||
**Endpoints to SKIP** (read-only POST patterns, no state change):
|
||||
- `DeltaCompareEndpoints.HandleCompareAsync` (computation)
|
||||
- `CounterfactualEndpoints.HandleComputeAsync` (computation)
|
||||
- `EpssEndpoints.GetCurrentBatch` (batch read)
|
||||
- `SliceEndpoints.HandleQueryAsync` (query)
|
||||
- `ScoreReplayEndpoints` (replay verification, read-only)
|
||||
- `PolicyEndpoints` diagnostics/preview/runtime/overlay/linksets (read-only analysis)
|
||||
|
||||
**Endpoints to ANNOTATE** (~50 after filtering):
|
||||
- Sources CRUD + lifecycle operations
|
||||
- Scan/SBOM submission
|
||||
- Scan policy CRUD
|
||||
- Approvals create/revoke
|
||||
- Triage status updates, VEX submissions
|
||||
- Secret detection settings CRUD
|
||||
- SmartDiff VEX candidate reviews
|
||||
- Webhooks (state-changing: trigger scans)
|
||||
- Reports, offline kit import, call graph submit, witness verify
|
||||
- Runtime events/reconcile, reachability compute
|
||||
|
||||
Completion criteria:
|
||||
- [ ] ~50 endpoints annotated (with documented skip list)
|
||||
- [ ] Events visible in Timeline for module=scanner
|
||||
- [ ] No startup regressions
|
||||
|
||||
**Effort: 2 days**
|
||||
|
||||
### FILTER-004 - Batch 2: Annotate Platform
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Platform has ~107 state-changing endpoints across 23 files.
|
||||
- Apply group-level filter on each endpoint group.
|
||||
- Annotate with `AuditActionAttribute("platform", "<action>")`.
|
||||
- Use descriptive resource types: `identity_provider`, `trust_key`, `trust_issuer`, `trust_cert`, `script`, `environment`, `freeze_window`, `target`, `release_bundle`, `function_map`, `setup_session`, `localization`, `crypto_preference`, `environment_setting`, etc.
|
||||
- Skip read-only POSTs: score evaluate/verify (computational), AoC compatibility verify/validate (read-only checks), notify/signals/quota compatibility stubs that are proxied responses.
|
||||
- Pay special attention to `SetupEndpoints` (wizard steps) -- these are high-value audit targets (initial system configuration).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] ~90 endpoints annotated (with documented skip list)
|
||||
- [ ] Events visible in Timeline for module=platform
|
||||
- [ ] No startup regressions
|
||||
|
||||
**Effort: 2.5 days**
|
||||
|
||||
### FILTER-005 - Batch 2 continued: Annotate Authority
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Authority has ~49 state-changing endpoints across 10 files plus Program.cs inline endpoints.
|
||||
- **Special consideration**: Authority runs its own auth middleware, not the standard gateway-propagated identity. The `AuditActionFilter` must correctly extract actor from Authority's own `ClaimsPrincipal`.
|
||||
- Apply filter to admin group, console group, bootstrap group, and issuer directory groups.
|
||||
- Action mapping for admin operations: tenant (create, update, suspend, resume), user (create, update, enable, disable), role (create, update, preview_impact), client (create, update, rotate), token (revoke).
|
||||
- Action mapping for bootstrap: bootstrap_user, bootstrap_client, bootstrap_invite, revoke_service_account, rotate_signing, rotate_notifications, reload_plugins.
|
||||
- Action mapping for issuer directory: create_issuer, update_issuer, delete_issuer, create_key, rotate_key, revoke_key, set_trust, delete_trust.
|
||||
- Skip: OpenIddict protocol endpoints (token, introspect, revoke) -- these are auth protocol operations already captured by `AuthorityAuditSink`, not admin mutations. Authorize endpoint similarly.
|
||||
- Skip: Notify ack-token endpoints, vuln workflow anti-forgery endpoints (internal crypto operations, not user-facing mutations).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] ~35 admin/bootstrap/issuer endpoints annotated
|
||||
- [ ] Events visible in Timeline for module=authority
|
||||
- [ ] AuthorityAuditSink continues to work independently (no interference)
|
||||
|
||||
**Effort: 2 days**
|
||||
|
||||
### FILTER-006 - Batch 2 continued: Annotate Notify
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Notify has ~65 state-changing endpoints across 15 files.
|
||||
- Group-level filter on each endpoint group.
|
||||
- Module name: "notify".
|
||||
- Action mapping: rules (create, update, delete), templates (create, update, delete, preview, validate), incidents (acknowledge, resolve), escalation (create/update/delete policy, create/update/delete schedule, create/delete override, start, escalate, stop), quiet_hours, throttle, storm, fallback, localization, security, operator_override, simulation, observability.
|
||||
- Skip: `POST /tokens/sign`, `POST /tokens/verify`, `POST /html/sanitize`, `POST /html/validate`, `POST /html/strip` -- these are utility/computation endpoints that do not mutate state.
|
||||
- Focus on: CRUD operations, incident lifecycle, escalation lifecycle, dead letter management, chaos experiments, retention policies.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] ~50 endpoints annotated (with documented skip list)
|
||||
- [ ] Events visible in Timeline for module=notify
|
||||
- [ ] No conflict with existing `NotifyAuditRepository` writes
|
||||
|
||||
**Effort: 2 days**
|
||||
|
||||
### FILTER-007 - Batch 2 continued: Annotate Policy Engine + Gateway
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- Policy Engine has ~162 state-changing endpoints across 57 files. Policy Gateway duplicates ~56 of these.
|
||||
- **Strategy**: Annotate Engine endpoints. For Gateway, apply the same attributes to the matching Gateway endpoint files.
|
||||
- Module name: "policy".
|
||||
- The Gateway files under `src/Policy/StellaOps.Policy.Gateway/Endpoints/` mirror the Engine's `src/Policy/StellaOps.Policy.Engine/Endpoints/Gateway/` directory. Both need annotation.
|
||||
- High-priority groups (security-critical):
|
||||
1. Gate endpoints (evaluate, force-pass) -- action: evaluate_gate, force_pass_gate
|
||||
2. Exception approvals (approve, reject, escalate, delegate) -- action: approve_exception, reject_exception, escalate_exception, delegate_exception
|
||||
3. Governance CRUD -- action: create_governance, update_governance, delete_governance
|
||||
4. Sealed mode (enable, disable, emergency) -- action: enable_sealed, disable_sealed, emergency_unseal
|
||||
5. Override CRUD -- action: create_override, expire_override
|
||||
- Lower-priority (operational):
|
||||
6. Simulation endpoints (create, cancel, retry, preview)
|
||||
7. Risk profile/budget CRUD
|
||||
8. Verification policy CRUD
|
||||
9. Snapshot create/restore
|
||||
10. Compilation, lint, attestation reports
|
||||
- Skip: Batch evaluation, policy decision, score gate evaluate (read-only evaluations that return computed results without mutating state).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] ~130 endpoints annotated across Engine and Gateway (with documented skip list)
|
||||
- [ ] Events visible in Timeline for module=policy
|
||||
- [ ] No conflict with existing `PolicyAuditRepository` writes
|
||||
|
||||
**Effort: 4 days**
|
||||
|
||||
### FILTER-008 - Batch 2 continued: Annotate Release-Orchestrator + Scheduler
|
||||
Status: TODO
|
||||
Dependency: FILTER-001
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- **Release-Orchestrator** (~40 endpoints, 9 files): Module "release-orchestrator". High-value actions: create_release, promote, deploy, rollback, approve, reject. Skip: legacy stubs (`JobEngineLegacyEndpoints` returning 501).
|
||||
- **Scheduler** (~31 endpoints, 9 files): Module "scheduler". Actions: create_schedule, update_schedule, delete_schedule, pause, resume, create_run, cancel_run, retry_run, trigger_workflow.
|
||||
- PacksRegistry (part of Scheduler service): Module "packs-registry". Actions: upload_pack, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All ~71 endpoints annotated
|
||||
- [ ] Events visible in Timeline for modules: release-orchestrator, scheduler, packs-registry
|
||||
- [ ] No conflict with existing `PostgresAuditRepository` hash chain writes
|
||||
|
||||
**Effort: 2 days**
|
||||
|
||||
### DEPRECATE-001 - Batch 3: Dual-write for services with local audit tables
|
||||
Status: TODO
|
||||
Dependency: FILTER-002 through FILTER-008 (at least the relevant service batch)
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- For each service with an existing local audit table, add a secondary write path that emits to Timeline via `IAuditEventEmitter.EmitAsync()` inside the existing audit repository write methods:
|
||||
1. **Authority**: Add `IAuditEventEmitter.EmitAsync()` in `AuthorityAuditSink.WriteAsync()` to emit auth events (login, token grant, lockout) to Timeline. Map `AuthEventRecord` to `AuditEventPayload`.
|
||||
2. **Policy**: Add emission in `PolicyAuditRepository.CreateAsync()` and in `GateBypassAuditor` to emit bypass decisions to Timeline.
|
||||
3. **Notify**: Add emission in `NotifyAuditRepository` create method.
|
||||
4. **Scheduler**: Add emission in `ISchedulerAuditService` implementation.
|
||||
5. **JobEngine/ReleaseOrchestrator**: Add emission in `PostgresAuditRepository.AppendAsync()`. Map `AuditEntry` fields to `AuditEventPayload`.
|
||||
6. **Attestor**: Wire `AddAuditEmission()` in Program.cs (not yet wired). Add emission alongside `AuditLogEntity` inserts.
|
||||
- All emissions must be fire-and-forget (matching existing `AuditActionFilter` pattern) -- failure to emit to Timeline must never break the local write.
|
||||
- Add a log warning when emission fails (already built into `HttpAuditEventEmitter`).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Dual-write verified for all 6 services (events appear in both local table and Timeline)
|
||||
- [ ] Local audit write latency unchanged (emission is async/fire-and-forget)
|
||||
- [ ] No data loss: local table remains the authoritative source during this phase
|
||||
|
||||
**Effort: 3 days**
|
||||
|
||||
### DEPRECATE-002 - Batch 4: Redirect reads to Timeline unified sink
|
||||
Status: TODO
|
||||
Dependency: DEPRECATE-001, 30-day dual-write verification period
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- After 30 days of verified dual-write with zero data discrepancies:
|
||||
1. **Authority**: Update `ConsoleAdminEndpointExtensions.ListAuditEvents()` to query Timeline `/api/v1/audit/events?modules=authority` instead of local `authority.audit` table. Add `Obsolete` attribute and deprecation response headers to the local audit endpoint.
|
||||
2. **Policy**: Update governance audit endpoints to query Timeline. Keep gate bypass audit endpoints reading from local `policy.gate_bypass_audit` (domain evidence, not generic audit).
|
||||
3. **Notify**: Update `/api/v1/notify/audit` to proxy to Timeline.
|
||||
4. **Scheduler**: Internal audit reads redirected to Timeline.
|
||||
5. **ReleaseOrchestrator**: Update `/api/v1/release-orchestrator/audit` LIST/SEARCH/SUMMARY endpoints to query Timeline. **Keep chain verification endpoint reading from local table** (service-level chain integrity is different from unified chain).
|
||||
6. **Attestor**: Internal audit reads redirected to Timeline.
|
||||
- Update `HttpUnifiedAuditEventProvider` to stop polling deprecated service-specific audit endpoints.
|
||||
- Add deprecation headers: `Sunset: <date>`, `Deprecation: true`, `Link: <timeline-url>; rel="successor-version"`.
|
||||
|
||||
Completion criteria:
|
||||
- [ ] All service-specific audit read endpoints return deprecation headers
|
||||
- [ ] Timeline is the primary read source for all generic audit queries
|
||||
- [ ] UI `AuditLogClient` uses unified endpoint exclusively (no fallback to per-service)
|
||||
- [ ] Per-service audit endpoints still functional (backward compatibility for 90 days)
|
||||
|
||||
**Effort: 3 days (implementation) + 30-day verification wait**
|
||||
|
||||
### DEPRECATE-003 - Batch 5: Drop deprecated local audit tables
|
||||
Status: TODO
|
||||
Dependency: DEPRECATE-002, 90-day backward-compatibility period
|
||||
Owners: Developer (backend)
|
||||
Task description:
|
||||
- After 90 days with no clients reading from deprecated endpoints:
|
||||
1. Remove local audit write code from repositories (stop dual-write).
|
||||
2. Create SQL migrations to drop tables:
|
||||
- `DROP TABLE IF EXISTS authority.audit CASCADE;`
|
||||
- `DROP TABLE IF EXISTS authority.airgap_audit CASCADE;`
|
||||
- `DROP TABLE IF EXISTS authority.offline_kit_audit CASCADE;`
|
||||
- `DROP TABLE IF EXISTS policy.audit CASCADE;` (keep `policy.gate_bypass_audit`)
|
||||
- `DROP TABLE IF EXISTS notify.audit CASCADE;`
|
||||
- `DROP TABLE IF EXISTS scheduler.audit CASCADE;` (drop all partitions)
|
||||
- `DROP TABLE IF EXISTS proofchain.audit_log CASCADE;`
|
||||
3. **Do NOT drop** `audit_entries` / `audit_sequences` in JobEngine/ReleaseOrchestrator yet -- the hash chain is service-level evidence. Reclassify as domain tables, not audit tables. Evaluate for removal in a future sprint after 180-day parallel chain verification between local and Timeline chains.
|
||||
4. Remove deprecated audit endpoint registrations.
|
||||
5. Remove `PolicyAuditRepository`, `NotifyAuditRepository`, `AuthorityAuditSink` local DB write paths (keep structured logging).
|
||||
6. Remove `HttpUnifiedAuditEventProvider` polling entirely (all data flows through emission now).
|
||||
|
||||
Completion criteria:
|
||||
- [ ] Local audit tables dropped (except JobEngine/ReleaseOrchestrator chain tables and Policy gate bypass)
|
||||
- [ ] No 500 errors from missing tables
|
||||
- [ ] Timeline is the sole audit data store
|
||||
- [ ] All audit read endpoints serve data from Timeline
|
||||
- [ ] Deprecated code removed, no dead references
|
||||
|
||||
**Effort: 2 days (implementation) + 90-day wait from DEPRECATE-002**
|
||||
|
||||
---
|
||||
|
||||
## Effort Summary
|
||||
|
||||
| Batch | Tasks | Effort | Timeline |
|
||||
|---|---|---|---|
|
||||
| **Batch 1**: Convention helper + simple services (Integrations, EvidenceLocker, Scanner) | FILTER-001, FILTER-002, FILTER-003 | 3.5 days | Week 1 |
|
||||
| **Batch 2**: Complex services (Platform, Authority, Notify, Policy, ReleaseOrchestrator, Scheduler) | FILTER-004 through FILTER-008 | 12.5 days | Weeks 2-4 |
|
||||
| **Batch 3**: Dual-write transition | DEPRECATE-001 | 3 days | Week 4-5 |
|
||||
| **Batch 4**: Read migration (after 30-day verification) | DEPRECATE-002 | 3 days + 30-day wait | Week 9-10 |
|
||||
| **Batch 5**: Drop local tables (after 90-day backward-compat) | DEPRECATE-003 | 2 days + 90-day wait | Week 22-23 |
|
||||
| **TOTAL** | | **24 days active work** + **120 days verification** | ~6 months end-to-end |
|
||||
|
||||
---
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-04-08 | Sprint created. Full endpoint inventory completed across all 9 wired services (~532 state-changing endpoints). Per-service audit table analysis completed for 6 services with local tables. | Planning |
|
||||
|
||||
## Decisions & Risks
|
||||
|
||||
### Decisions
|
||||
|
||||
1. **Group-level filter + per-endpoint metadata is the convention.** `AuditActionFilter` is a no-op without `AuditActionAttribute`, so applying it at the group level is safe and reduces boilerplate from 2 lines per endpoint to 1 line.
|
||||
|
||||
2. **Policy `gate_bypass_audit` and JobEngine/ReleaseOrchestrator `audit_entries` are reclassified as domain evidence tables, not audit.** Their query patterns (by image digest, by decision ID, by chain sequence) and integrity guarantees (hash chains, attestation digests) serve domain-specific needs that the generic unified store cannot efficiently replace. They should persist permanently alongside the unified audit sink.
|
||||
|
||||
3. **Read-only POST endpoints are excluded from audit annotation.** Endpoints like `/compare`, `/query`, `/evaluate` (when they compute a result without persisting state) do not produce meaningful audit events. Annotating them would create noise in the audit log.
|
||||
|
||||
4. **Authority auth-protocol events require separate emission.** The `AuthorityAuditSink` captures login attempts, token grants, and lockouts -- events that are NOT HTTP endpoint mutations. These must be emitted to Timeline via a direct `IAuditEventEmitter.EmitAsync()` call, not via `AuditActionFilter`.
|
||||
|
||||
5. **120-day verification pipeline.** Dual-write runs for 30 days before reads are redirected. Deprecated endpoints remain functional for 90 more days. Total 120 days from dual-write start to table drop. This is non-negotiable for a compliance-critical audit subsystem.
|
||||
|
||||
### Risks
|
||||
|
||||
1. **~532 endpoints is a large surface.** Risk of missed annotations or incorrect module/action strings. Mitigation: create an integration test that walks all registered endpoints and asserts that every non-GET endpoint has `AuditActionAttribute` metadata (or is in an explicit skip list).
|
||||
|
||||
2. **Policy Engine/Gateway duplication.** The same endpoint logic exists in two places. Risk of annotation drift. Mitigation: consider extracting shared endpoint registration into a common library, or generating Gateway endpoints from Engine definitions.
|
||||
|
||||
3. **Fire-and-forget emission can silently drop events.** If Timeline is down during the 30-day dual-write period, the local table has events that Timeline does not. Mitigation: add a reconciliation job that compares local table event counts with Timeline for the same module/time range and alerts on discrepancies.
|
||||
|
||||
4. **Performance impact of 532 additional HTTP calls.** Each annotated endpoint now makes a fire-and-forget HTTP POST to Timeline. Under high load, this could create back-pressure. Mitigation: `HttpAuditEventEmitter` already uses `IHttpClientFactory` with connection pooling. Add circuit-breaker via Polly if needed. The emission is async and never blocks the response.
|
||||
|
||||
5. **Existing Scheduler monthly partitioning is lost in Timeline.** The unified store does not partition by month. Retention will rely on `DELETE WHERE timestamp < cutoff` instead of `DROP PARTITION`. Mitigation: AUDIT-004 (from parent sprint) should add partitioning to the unified audit table.
|
||||
|
||||
## Next Checkpoints
|
||||
|
||||
- **Week 1**: Convention helper shipped, Integrations + EvidenceLocker + Scanner annotated
|
||||
- **Week 2-4**: All remaining services annotated
|
||||
- **Week 4-5**: Dual-write enabled, monitoring dashboard created
|
||||
- **Week 9-10**: Read migration after 30-day verification
|
||||
- **Week 22-23**: Table drop after 90-day backward-compat window
|
||||
Reference in New Issue
Block a user