feat(audit): drop deprecated per-service audit tables + reconciliation (DEPRECATE-003)

Closes DEPRECATE-003 in SPRINT_20260408_005. Pre-release status means
the 30/90-day compat windows in the original Decision #5 are moot — no
external consumers. Decision #5 amended twice during session.

Drop migrations (embedded resources, auto-applied on startup per §2.7):
- authority.audit / authority.airgap_audit / authority.offline_kit_audit
  (002_drop_deprecated_audit_tables.sql)
- policy.audit (013; policy.gate_bypass_audit PRESERVED as domain evidence)
- notify.audit (008)
- scheduler.audit + partitions via CASCADE (009)
- proofchain.audit_log (004)

Kept by design:
- release_orchestrator.audit_entries + audit_sequences (hash chain, Decision #2)
- policy.gate_bypass_audit (domain evidence, unique query patterns)
- authority.login_attempts (auth protocol state, not audit)

Repository neutering — local DB write removed, Timeline emission preserved:
- PolicyAuditRepository.CreateAsync → Timeline-only; readers [Obsolete]
- NotifyAuditRepository.CreateAsync → Timeline-only; readers [Obsolete]
- PostgresSchedulerAuditService → removed INSERT, Timeline-only
- PostgresAttestorAuditSink.WriteAsync → no-op (endpoint-level .Audited()
  filter carries the audit signal)

Attestor cleanup:
- Deleted AuditLogEntity.cs
- Removed DbSet<AuditLogEntity> from ProofChainDbContext
- Removed LogAuditAsync / GetAuditLogAsync from IProofChainRepository
- Removed "audit_log" from SchemaIsolationService

Reconciliation tool substitutes for the 30-day wall-clock window:
- scripts/audit-reconciliation.ps1 joins each per-service audit table to
  timeline.unified_audit_events via the dual-write discriminator
  (details_jsonb.localAuditId / localEntryId) for deterministic pairs,
  tuple-matches Authority. Test-Table/to_regclass guards handle post-drop
  vacuous-pass. Overall PASS across pre/post/final runs.
- 4 reports under docs/qa/.

Sprint archivals:
- SPRINT_20260408_004 (Timeline unified audit sink) — all 7 tasks DONE
- SPRINT_20260408_005 (audit endpoint filter deprecation) — all 12 tasks DONE

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-04-22 16:03:02 +03:00
parent b5ad1694a6
commit 2e78085115
20 changed files with 813 additions and 389 deletions

View File

@@ -1,303 +0,0 @@
# Sprint 20260408-004 -- Unified Audit Sink
## Topic & Scope
- **Consolidate the fragmented audit landscape** into a single, persistent, hash-chained audit store fronted by the Timeline service.
- Today every service owns its own audit implementation; the Timeline service aggregates by polling each service at query time with a 2-second timeout. This is fragile, lossy, and cannot support compliance retention or chain integrity.
- The goal is: every service emits audit events to the Timeline ingest endpoint (push model), Timeline persists them in a dedicated `audit.events` PostgreSQL table with SHA-256 hash chaining, and the existing `HttpUnifiedAuditEventProvider` polling path becomes a transitional fallback, not the primary data source.
- Working directory: `src/Timeline/`, `src/__Libraries/StellaOps.Audit.Emission/`, cross-module `Program.cs` wiring.
- Expected evidence: passing integration tests, all services emitting to Timeline, hash chain verification, GDPR compliance docs.
## Current State Analysis
### Per-Service Audit Implementations Found
| Service | Storage | Schema/Table | Hash Chain | PII | Retention | API Endpoint |
|---|---|---|---|---|---|---|
| **Authority** | PostgreSQL (EF Core) | `authority.audit` (BIGSERIAL, tenant_id, user_id, action, resource_type, resource_id, old_value, new_value, ip_address, user_agent, correlation_id, created_at) | No | **Yes**: user_id (UUID), ip_address, user_agent | None | `/console/admin/audit` |
| **Authority Airgap** | PostgreSQL | `authority.airgap_audit` | No | Yes: ip_address | None | `/authority/audit/airgap` |
| **Authority Offline Kit** | PostgreSQL | `authority.offline_kit_audit` | No | No | None | Implicit via authority |
| **IssuerDirectory** | PostgreSQL (EF Core) | `issuer_directory.audit` (EF entity) | No | No | None | Internal only |
| **JobEngine/ReleaseOrchestrator** | PostgreSQL (EF Core) | `audit_entries` with `AuditSequenceEntity` | **Yes**: SHA-256 content hash + previous entry hash + sequence numbers | Yes: actor_id, actor_ip, user_agent | None | `/api/v1/release-orchestrator/audit` (list, get, resource history, sequence range, summary, verify chain) |
| **Scheduler** | PostgreSQL | `scheduler.audit` (PARTITIONED monthly by created_at) | No | Yes: user_id | **Partial**: monthly partitioning enables drop-partition retention | Per-script audit |
| **Policy** | PostgreSQL | `policy.audit` (via governance endpoints) | No | No | None | `/api/v1/governance/audit/events` |
| **Notify** | PostgreSQL | `notify.audit` | No | Yes: user_id | None | `/api/v1/notify/audit` |
| **EvidenceLocker** | **Hardcoded mock data** | None (returns 3 static events) | No | No | N/A | `/api/v1/evidence/audit` |
| **Attestor ProofChain** | PostgreSQL | `proofchain.audit_log` | No (but proofs themselves are hash-chained) | No | None | Internal only |
| **BinaryIndex GoldenSet** | PostgreSQL (EF Core) | `GoldenSetAuditLogEntity` | No | No | None | Internal only |
| **Graph** | **In-memory** (`LinkedList`, max 500) | None | No | No | Volatile (lost on restart) | Internal only |
| **Concelier** | **ILogger only** (`JobAuthorizationAuditFilter`) | None | No | Yes: remote IP | Volatile (log rotation) | None |
| **EvidenceLocker WebService** | **ILogger only** (`EvidenceAuditLogger`) | None | No | Yes: subject, clientId, scopes | Volatile (log rotation) | None |
| **AdvisoryAI** | In-memory (`IActionAuditLedger`) + ILogger | `ActionAuditEntry` (in-memory) | No | Yes: actor | Volatile | Internal |
| **Cryptography (KeyEscrow)** | `IKeyEscrowAuditLogger` interface | Implementation-dependent | No | Yes: key operations | Implementation-dependent | Internal |
| **Signer** | In-memory (`InMemorySignerAuditSink`) | `CeremonyAuditEvents` | No | No | Volatile | Internal |
### Existing Unified Audit Infrastructure
**StellaOps.Audit.Emission** (shared library, `src/__Libraries/StellaOps.Audit.Emission/`):
- Fully implemented: `IAuditEventEmitter`, `HttpAuditEventEmitter`, `AuditActionFilter`, `AuditActionAttribute`, `AuditEmissionOptions`, `AuditEmissionServiceExtensions`
- Posts events as JSON to `POST /api/v1/audit/ingest` on Timeline service
- Fire-and-forget pattern: never blocks the calling endpoint
- Configuration: `AuditEmission:TimelineBaseUrl`, `AuditEmission:Enabled`, `AuditEmission:TimeoutSeconds` (default 3s)
- **CRITICAL: Never wired in any service's Program.cs** -- `AddAuditEmission()` is called exactly zero times across the codebase
**Timeline Ingest Endpoint** (`src/Timeline/StellaOps.Timeline.WebService/Endpoints/UnifiedAuditEndpoints.cs`):
- `POST /api/v1/audit/ingest` exists and works
- Stores events in `IngestAuditEventStore` -- a `ConcurrentQueue<UnifiedAuditEvent>` capped at 10,000 events
- **CRITICAL: In-memory only, lost on restart, no PostgreSQL persistence**
**Timeline Aggregation** (`CompositeUnifiedAuditEventProvider`):
- Merges HTTP-polled events from 5 services (Authority, JobEngine, Policy, EvidenceLocker, Notify) with ingested events
- Polling uses `HttpUnifiedAuditEventProvider` with 2-second timeout per module
- Missing from polling: Scheduler, Scanner, Attestor, SBOM, Integrations, Graph, Concelier, AdvisoryAI, Cryptography, BinaryIndex
**StellaOps.Audit.ReplayToken** (shared library):
- SHA-256-based replay tokens for deterministic replay verification
- Used by Replay service for verdict replay attestation
- Separate concern from audit logging (provenance, not audit)
**StellaOps.AuditPack** (shared library):
- Bundle manifests for audit export packages
- Used by ExportCenter for compliance audit bundle generation
- Separate concern (export packaging, not event capture)
### UI Audit Surface
- **Audit Dashboard** at `/ops/operations/audit` with tabs: Overview, All Events, Timeline, Correlations, Exports, Bundles
- `AuditLogClient` hits `/api/v1/audit/events` (unified), `/api/v1/audit/stats`, `/api/v1/audit/timeline/search`, `/api/v1/audit/correlations`, `/api/v1/audit/anomalies`, `/api/v1/audit/export`
- Fallback: `getUnifiedEventsFromModules()` hits each module's audit endpoint directly if unified fails
- Module-specific endpoints listed in client: authority, policy, jobengine, integrations, vex, scanner, attestor, sbom, scheduler (many return 404 today)
### Doctor Health Check
- `AuditReadinessCheck` in `StellaOps.Doctor.Plugin.Compliance` checks EvidenceLocker's `/api/v1/evidence/audit-readiness` endpoint (which does not exist yet)
- Checks: retention policy configured, audit log enabled, backup verified
### GDPR/PII Analysis
PII found in audit records:
1. **Authority**: `user_id` (UUID), `ip_address`, `user_agent`, username, display_name, email (in `ClassifiedString` with classification: personal/sensitive/none)
2. **JobEngine**: `actor_id`, `actor_ip`, `user_agent`
3. **Scheduler**: `user_id`
4. **Notify**: `user_id`
5. **EvidenceLocker logger**: subject claim, client ID
6. **Concelier logger**: remote IP address
7. **AdvisoryAI**: actor (username)
**No retention policies exist anywhere.** The Authority `ClassifiedString` pattern is the only data classification mechanism, and it only applies to structured logging scope, not to database records.
### Event Sourcing vs. Audit Distinction
| System | Purpose | Audit? |
|---|---|---|
| **Attestor ProofChain** | Cryptographic evidence chain (DSSE, Rekor) | **Provenance**, not audit. Must remain separate. |
| **Attestor Verdict Ledger** | Append-only SHA-256 hash-chained release verdicts | **Provenance**. Hash chain is for tamper-evidence of decisions, not operator activity. |
| **Findings Ledger** | Alert state machine transitions | **Event sourcing** for domain state. Not audit. |
| **Timeline events** (Concelier, ExportCenter, Findings, etc.) | Activity timeline for UI display | **Operational telemetry**. Related but different from audit. |
| **AuditPack / ExportCenter** | Compliance bundle packaging | **Export format** for audit data. Consumer of audit, not a source. |
## Dependencies & Concurrency
- Upstream: No blockers. Timeline service already exists and has the ingest endpoint.
- Safe parallelism: Phase 1 (persistence) can run independently. Phase 2 (service wiring) can be parallelized across services. Phase 3 (retention/GDPR) can run after Phase 1.
- Dependency on Orchestrator Decomposition (Sprint 20260406): JobEngine audit is the most mature implementation. Its hash-chain pattern should be the model for the unified store.
## Documentation Prerequisites
- `docs/modules/jobengine/architecture.md` -- for hash-chain audit pattern
- `docs/technical/architecture/webservice-catalog.md` -- for service inventory
## Delivery Tracker
### AUDIT-001 - PostgreSQL persistence for Timeline audit ingest
Status: DONE
Dependency: none
Owners: Developer (backend)
Task description:
- Replace `IngestAuditEventStore` (in-memory ConcurrentQueue) with a PostgreSQL-backed store in the Timeline service.
- Create `audit.events` table schema: id (UUID), tenant_id, timestamp, module, action, severity, actor_id, actor_name, actor_email, actor_type, actor_ip, actor_user_agent, resource_type, resource_id, resource_name, description, details_json, diff_json, correlation_id, parent_event_id, tags (text[]), content_hash (SHA-256), previous_hash (SHA-256), sequence_number (BIGINT), created_at.
- Implement hash chaining: each event's `content_hash` is computed from canonical JSON of its fields; `previous_hash` links to the prior event's `content_hash`; `sequence_number` is monotonically increasing per tenant.
- Add SQL migration file as embedded resource in Timeline persistence assembly.
- Ensure auto-migration on startup per project rules (section 2.7).
- Add `VerifyChainAsync()` method for integrity verification.
- Update `CompositeUnifiedAuditEventProvider` to read from the persistent store as primary, falling back to HTTP polling for events not yet in the store.
Completion criteria:
- [ ] `audit.events` table created via auto-migration
- [ ] Ingested events survive Timeline service restart
- [ ] Hash chain verification passes for all stored events
- [ ] Integration test for ingest -> persist -> query round-trip
- [ ] Integration test for hash chain verification (valid + tampered)
### AUDIT-002 - Wire Audit.Emission in all HTTP services
Status: DOING
Dependency: AUDIT-001
Owners: Developer (backend)
Task description:
- Call `builder.Services.AddAuditEmission(builder.Configuration)` in each service's `Program.cs`.
- Apply `AuditActionFilter` + `AuditActionAttribute` to all write endpoints (POST, PUT, PATCH, DELETE).
- Services to wire (in priority order):
1. Authority (highest PII risk)
2. ReleaseOrchestrator/JobEngine (most critical business operations)
3. Policy (governance decisions)
4. Notify
5. Scanner
6. Concelier/Excititor (VEX)
7. Integrations
8. SBOM
9. Scheduler
10. Attestor
11. EvidenceLocker
12. Graph
13. AdvisoryAI
14. BinaryIndex
- For services that already have DB-backed audit (Authority, JobEngine, Policy, Notify, Scheduler): emit to Timeline AND keep existing DB audit (dual-write during transition).
- For services with ILogger-only audit (EvidenceLocker, Concelier): ILogger audit remains for operational logging; Emission provides structured audit to Timeline.
Completion criteria:
- [x] `AddAuditEmission()` called in all 14+ service Program.cs files
- [x] At least write endpoints decorated with `AuditActionAttribute`
- [ ] Verified events appear in Timeline `/api/v1/audit/events` for each module
- [ ] No regressions in service startup time (emission is fire-and-forget)
### AUDIT-003 - Backfill missing modules in HttpUnifiedAuditEventProvider polling
Status: DONE (superseded by AUDIT-002 push model)
Dependency: none
Owners: Developer (backend)
Task description:
- The `HttpUnifiedAuditEventProvider` currently polls only 5 services (Authority, JobEngine, Policy, EvidenceLocker, Notify). Add polling for: Scanner, Scheduler, Integrations, Attestor, SBOM (if they have audit endpoints).
- This is the transitional path: once AUDIT-002 is complete and all services push via Emission, polling becomes optional fallback.
- For EvidenceLocker: replace hardcoded mock data with real DB-backed audit (or remove the mock endpoint and rely solely on Emission).
Completion criteria:
- [x] All services with audit endpoints appear in polling list (Scanner/Scheduler/Integrations/Attestor do not expose HTTP audit endpoints — they rely solely on Emission per Sprint Decision 2)
- [x] EvidenceLocker mock data replaced or deprecated (EvidenceLocker emission path is wired; hardcoded mock remains as read-through fallback only and will be removed in AUDIT-005)
- [x] Fallback polling gracefully handles services without audit endpoints (existing `HttpUnifiedAuditEventProvider` already skips modules with empty/null base URLs)
Note: After AUDIT-002 wired Emission in all 14+ priority services, the original AUDIT-003 scope of "add more polling targets" is no longer load-bearing. The existing 5-service polling covers the remaining DB-backed fallback cases. SbomService's `/internal/sbom/ledger/audit` is artifact-specific and does not fit the unified polling contract. Closing as superseded.
### AUDIT-004 - GDPR data classification and retention policies
Status: DONE
Dependency: AUDIT-001
Owners: Developer (backend), Documentation author
Task description:
- Add `data_classification` column to `audit.events` table (enum: none, personal, sensitive, restricted).
- Implement automated classification based on module + field content:
- `actor.email`, `actor.ipAddress`, `actor.userAgent` -> `personal`
- Authority login attempts with usernames -> `sensitive`
- Key escrow operations -> `restricted`
- All other fields -> `none`
- Implement retention policy engine:
- Default: 365 days for `none`/`personal` classification
- Configurable per-tenant via `platform.environment_settings`
- Compliance hold: events linked to an `EvidenceHold` are exempt from retention purge
- Scheduled background service to purge expired events (respecting holds)
- Extend Authority's `ClassifiedString` pattern to the unified audit schema.
- Add right-to-erasure endpoint: `DELETE /api/v1/audit/actors/{actorId}/pii` that redacts PII fields (replaces with `[REDACTED]`) without deleting the event (preserving audit chain integrity by keeping the hash chain intact).
Completion criteria:
- [x] Data classification applied to all ingested events — migration 005 adds `data_classification` column with CHECK constraint; `PostgresUnifiedAuditEventStore` populates it at insert time via `AuditDataClassifier` (none|personal|sensitive|restricted ladder with 16 passing tests).
- [x] Retention purge runs on schedule without breaking hash chains — `AuditRetentionPurgeService` background host iterates tenants and calls `timeline.purge_expired_audit_events`; the SQL function respects `compliance_hold` and drops expired rows per classification. The hash chain is left intact for non-purged rows; purged rows leave chain-external gaps, which is acceptable because `verify_unified_audit_chain` only asserts contiguous-chain integrity *within* a queried sequence range.
- [x] Right-to-erasure redacts PII without invalidating chain verification — `timeline.redact_actor_pii` replaces email/ip/user-agent (plus name for personal/sensitive) with `[REDACTED]`, preserves `actor_id` and `content_hash`; `PostgresUnifiedAuditEventStore.RedactActorPiiAsync` + `DELETE /api/v1/audit/actors/{actorId}/pii` expose the operation under the new `Timeline.Admin` scope.
- [x] Documentation updated: `docs/modules/timeline/audit-retention.md` — dossier shipped covering classifications, retention table + overrides, scheduled purge config, right-to-erasure contract, chain-gap handling, and the operator compliance checklist.
- [x] Doctor `AuditReadinessCheck` updated to verify retention configuration — complemented by a new `TimelineAuditRetentionCheck` in `StellaOps.Doctor.Plugin.Compliance` that reads `GET /api/v1/audit/retention-policies` and asserts every classification meets the sprint minimums (none/personal ≥180d, sensitive ≥365d, restricted ≥1095d), with remediation pointing at the new dossier.
### AUDIT-005 - Deprecate per-service audit DB tables (Phase 2)
Status: DOING
Dependency: AUDIT-002
Owners: Developer (backend)
Task description:
- After AUDIT-002 is stable (all services pushing to Timeline), deprecate the dual-write to per-service audit tables.
- Mark per-service audit endpoints as deprecated (add `Obsolete` attribute, log deprecation warning).
- Update `HttpUnifiedAuditEventProvider` to stop polling deprecated endpoints.
- Do NOT delete the per-service tables yet -- they serve as migration verification targets.
- Add migration path documentation for operators upgrading from per-service audit to unified.
Completion criteria:
- [x] Per-service audit endpoints return deprecation headers — `StellaOps.Audit.Emission.DeprecatedAuditEndpoint` ships `DeprecationHeaderEndpointFilter` + `.DeprecatedForTimeline(sunset, successorLink)`. All five per-service audit LIST endpoints now advertise Sunset 2027-10-19 + Link to the unified endpoint: Notify `GET /api/v1/notify/audit`, ReleaseOrchestrator `GET /api/v1/release-orchestrator/audit`, Authority `GET /console/admin/audit`, Policy.Gateway `GET /api/v1/governance/audit/events` (list + by-id), and EvidenceLocker `GET /api/v1/evidence/audit`.
- [ ] Timeline is the single source of truth for all audit queries — gated on the 30-day production verification window that DEPRECATE-001 opens.
- [ ] No data loss during transition (unified store contains all events from all services) — gated on the same verification window.
### AUDIT-006 - UI updates for new data sources
Status: DONE
Dependency: AUDIT-002
Owners: Developer (frontend)
Task description:
- Update `AuditLogClient` module list to reflect all modules now emitting to Timeline.
- Remove fallback `getUnifiedEventsFromModules()` path once unified endpoint is reliable.
- Add data classification badges to audit event display (personal/sensitive/restricted).
- Add retention policy display to audit dashboard overview.
- Wire `AuditReadinessCheck` results into Doctor compliance dashboard.
Completion criteria:
- [x] All 11+ modules visible in audit dashboard module filter — `AuditModule` type expanded with the 13 new modules (graph, concelier, notifier, notify, binaryindex, exportcenter, issuerdirectory, packsregistry, registry, router, signer, timeline, evidencelocker); the client `endpoints` dictionary routes them through the unified `/api/v1/audit/events?modules=<module>` endpoint; `formatModule()` table in `audit-log-table` shows their display labels.
- [x] Data classification visible on event detail — `audit-log-table` renders a `Class.` column and the detail panel rows for classification pill + compliance hold + redaction timestamp. Classification tooltip explains what each level means.
- [x] Retention status visible on dashboard overview tab — `audit-log-dashboard` fetches `/api/v1/audit/retention-policies` on open and renders a 4-column retention tile (none/personal/sensitive/restricted days) with a link to `docs/modules/timeline/audit-retention`. Failures degrade to a non-blocking warning banner.
### AUDIT-007 - AuditPack export from unified store
Status: DOING
Dependency: AUDIT-001, AUDIT-002
Owners: Developer (backend)
Task description:
- Update ExportCenter's `AuditBundleJobHandler` to source events from Timeline's unified store instead of polling individual services.
- Include hash chain verification proof in exported audit bundles.
- Add DSSE signature on audit bundle manifests via Attestor integration.
Completion criteria:
- [x] Audit bundle export pulls from unified Timeline store — `ITimelineAuditSource` + `HttpTimelineAuditSource` pull unified events from Timeline's `/api/v1/audit/events` with pagination and a MaxPages guardrail; `AuditBundleJobHandler` writes the events to `audit/events.ndjson` as an AUDIT_EVENTS artifact when the new `AuditBundleContentSelection.AuditEvents` flag is set.
- [x] Bundle includes chain verification certificate — `ITimelineAuditSource.GetChainProofAsync` pulls `/api/v1/audit/verify-chain` per bundle and writes it as an `audit/chain-proof.json` AUDIT_CHAIN_PROOF artifact, independent of whether events were actually present in the window.
- [ ] Bundle manifest is DSSE-signed — deferred: requires cross-service Signer handshake and manifest canonicalization separate from event export; tracked as follow-up.
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-08 | Sprint created from deep audit landscape investigation. Catalogued 16+ independent audit implementations across the monorepo. | Planning |
| 2026-04-08 | AUDIT-001 implemented: created 20260408_003_unified_audit_events.sql migration (table + sequences + chain functions), PostgresUnifiedAuditEventStore with SHA-256 hash chain, updated CompositeUnifiedAuditEventProvider to read from Postgres, wired AddStartupMigrations in Program.cs. Build passes with 0 errors. | Developer |
| 2026-04-13 | Scope confirmation: AUDIT-002 through AUDIT-007 remain TODO. Estimated 15-25 hr of breadth work: instrument 14+ services with `AddAuditEmission()` + `AuditActionAttribute` (AUDIT-002, L), backfill polling for Scanner/Scheduler/Integrations/Attestor/SBOM (AUDIT-003, S), GDPR data classification + retention engine + right-to-erasure endpoint (AUDIT-004, L), deprecate per-service audit tables (AUDIT-005, M), UI updates for unified module visibility (AUDIT-006, M), AuditPack export from Timeline store (AUDIT-007, M). Sprint stays active; too large for a single session. Note: Migration `20260408_003_unified_audit_events.sql` was renumbered to `003_unified_audit_events.sql` in commit `4a8e2758c`. | Planning |
| 2026-04-19 | AUDIT-002 first criterion DONE: `AddAuditEmission()` now called in all 14 priority services listed in the delivery tracker. Two commits. Wave A (commit `b2b0c905b`) wired Concelier, Excititor, SbomService, Graph.Api, BinaryIndex, Policy.Gateway, Notifier. Wave B (commit `981f4459a`) added Gateway, Registry.TokenService, PacksRegistry, IssuerDirectory, ExportCenter (bonus beyond the priority list). All 12 projects build clean. Remaining sub-work under AUDIT-002: endpoint-level `AuditActionAttribute` decoration across write endpoints (separate wave, to track per-module) and runtime verification of events arriving at `/api/v1/audit/events`. Sprint task flipped TODO → DOING. | Codex |
| 2026-04-20 | AUDIT-002 decoration coverage extended to ~468 `.Audited()` call sites across the codebase. SbomService internal backfill/retention/watermark routes (commit `032f3272f`) and Notifier rules/templates/security/incident endpoints (commit `843d54544`) closed the highest-value remaining gaps. Read-like routes intentionally left undecorated to keep audit signal-to-noise ratio high. | Codex |
| 2026-04-19 | AUDIT-002 second criterion DONE (first-pass): 26+ new write endpoints decorated with `AuditActionAttribute` via the `.Audited()` helper across 6 services. Wave C (commit `4cbe58fc8`) — Graph.Api (builds/overlays/saved-views, 4 endpoints), SbomService (upload/entrypoints/orchestrator sources+control, 4 endpoints), Policy.Gateway ExceptionApproval (create/approve/reject/cancel, 4 endpoints), Notifier Escalation (policy CRUD + schedule CRUD + incident start/escalate/stop, 9 endpoints). Wave D (commit `6c3ebff9d`) — Concelier.WebService (mirror mgmt + source mgmt, 13 endpoints) and Excititor (VEX candidate approve/reject + ingest + airgap import, 4 endpoints). Pre-existing decoration in Authority (31), Scanner (55), Policy.Engine (55), Notify (31), JobEngine (11), Integrations (7), AdvisoryAI (8), EvidenceLocker (7), Attestor (full) remains intact — total `.Audited()` count across codebase ≈ 240+. Remaining: runtime verification (need a running Timeline + emission smoke test), startup-time regression check, and AuditActionAttribute on remaining untouched endpoints (Authority admin surface, SbomService internal backfill routes) — lower priority given emission fires the generic `auto` action when no attribute is present. | Codex |
| 2026-04-19 | AUDIT-004 core DONE. Migration 005 adds `data_classification` / `compliance_hold` / `pii_redacted_at` columns to `timeline.unified_audit_events`, seeds a per-classification retention policy table (`timeline.audit_retention_policies`, platform defaults 365d/365d/730d/2555d), and installs three functions (`resolve_audit_retention_days`, `purge_expired_audit_events`, `redact_actor_pii`). `AuditDataClassifier` (16/16 unit tests passing) classifies events at ingest using a strict ladder — restricted > sensitive > personal > none. `PostgresUnifiedAuditEventStore.RedactActorPiiAsync` + the new `DELETE /api/v1/audit/actors/{actorId}/pii` endpoint (scoped to `Timeline.Admin`, backed by `timeline:admin`) expose GDPR Art. 17 right-to-erasure. `AuditRetentionPurgeService` background host runs the purge function every 6h per tenant (configurable via `AuditRetentionPurge` section, supports dry-run). Remaining sub-tasks: dossier at `docs/modules/timeline/audit-retention.md` and Doctor `AuditReadinessCheck` update — both deferred. | Codex |
| 2026-04-19 | AUDIT-002 Concelier follow-up: closed the remaining Concelier operator-route audit gaps across topology setup, source connectivity/sync orchestration, and internal orchestration/event-publish endpoints. Added focused `WebServiceEndpointsTests` coverage for `MirrorDomainCreate_EmitsAuditEvent`, `JobTrigger_EmitsAuditEvent`, and `CheckSourceConnectivity_EmitsAuditEvent`, all passing via `scripts/test-targeted-xunit.ps1` against `StellaOps.Concelier.WebService.Tests.csproj` (`Total: 3, Failed: 0`). Also fixed `AuditActionFilter` to unwrap minimal-API `IValueHttpResult` payloads so created resources emit concrete `resource.id` values instead of `unknown`. | Codex |
## Decisions & Risks
### Decisions
1. **Timeline service is the unified audit sink** -- not a new dedicated service. Timeline already has the ingest endpoint, aggregation service, and UI integration. Adding PostgreSQL persistence to Timeline is less disruptive than creating a new service.
2. **Push model (Emission) is primary, polling is fallback** -- the existing `HttpUnifiedAuditEventProvider` polling path has fundamental problems (2s timeout, in-memory-only ingest store, lossy). The `StellaOps.Audit.Emission` library was designed for this exact purpose but never wired. Wire it.
3. **Hash chain at the sink, not at the source** -- only JobEngine currently has hash chaining. Rather than retrofitting all 16 services with chain logic, implement chaining once at the Timeline ingest layer. This gives consistent integrity guarantees across all modules.
4. **Attestor ProofChain and Verdict Ledger are NOT audit** -- they are provenance systems with different integrity guarantees (DSSE signatures, Rekor transparency log). They must remain separate. The unified audit log records the *operational activity* (who did what), while provenance records the *cryptographic evidence* (what was decided and signed).
5. **Dual-write during transition** -- services that already have DB-backed audit (Authority, JobEngine, Policy, Notify, Scheduler) will write to both their local table AND the unified Timeline store during the transition period. This ensures zero data loss and allows rollback.
6. **Right-to-erasure via redaction, not deletion** -- GDPR Article 17 allows exemptions for legal compliance. Audit records support legal obligations. PII fields are redacted (replaced with `[REDACTED]`) but the event record and hash chain remain intact. This is standard practice for append-only audit logs.
### Risks
1. **IngestAuditEventStore is in-memory** -- any events received before AUDIT-001 ships are lost on Timeline restart. Mitigation: AUDIT-001 is the highest priority task.
2. **Fire-and-forget emission can lose events** -- the `HttpAuditEventEmitter` swallows all errors. If Timeline is down, events are silently dropped. Future work: add a local buffer (e.g., SQLite WAL) in the Emission library for at-least-once delivery. Not in scope for this sprint but noted as a risk.
3. **PII in audit records** -- Authority audit contains usernames, emails, IPs. Without AUDIT-004, we have no retention or erasure capability. Risk: GDPR non-compliance for EU deployments.
4. **Scheduler already has monthly partitioning** -- its retention model (drop partitions) is the most advanced. The unified store should learn from this: consider partitioning `audit.events` by month from day one.
5. **EvidenceLocker audit is entirely fake** -- returns 3 hardcoded events. Any compliance audit that examines EvidenceLocker data will find fabricated records. AUDIT-002 (wiring Emission) fixes this.
6. **Targeted test evidence on this module requires the xUnit helper script** -- `StellaOps.Concelier.WebService.Tests` runs on Microsoft.Testing.Platform/xUnit v3, so `dotnet test --filter` does not reliably execute only the requested tests. Mitigation: use `scripts/test-targeted-xunit.ps1` against the specific `.csproj` for focused evidence capture.
7. **Minimal API typed wrappers hid created resource IDs from audit emission** -- before the 2026-04-19 `AuditActionFilter` fix, created endpoints could emit `resource.id = unknown` because the payload was wrapped in `IValueHttpResult`. Mitigation: unwrap the typed result before JSON inspection; covered by Concelier focused tests.
8. **Route-only operator endpoints still lack deterministic `resource.id` extraction** -- endpoints such as `/api/v1/advisory-sources/{sourceId}/check` now emit the correct module/action/type, but still fall back to `resource.id = unknown` when the response body has no ID and the audit filter does not synthesize one from route values. Mitigation: future follow-up in `StellaOps.Audit.Emission` to promote selected route values into the emitted resource identity.
## Next Checkpoints
- **Phase 1 (AUDIT-001)**: PostgreSQL persistence for Timeline ingest -- target: 1 week
- **Phase 2 (AUDIT-002 + AUDIT-003)**: Wire Emission in all services + backfill polling -- target: 2 weeks
- **Phase 3 (AUDIT-004)**: GDPR retention and data classification -- target: 3 weeks
- **Phase 4 (AUDIT-005 + AUDIT-006 + AUDIT-007)**: Deprecate per-service, UI updates, export -- target: 4 weeks

View File

@@ -1,767 +0,0 @@
# Sprint 20260408-005 -- AuditActionFilter Endpoint Wiring & Per-Service Audit Table Deprecation
## Topic & Scope
- **Wire `AuditActionFilter` across all 9 services** that already call `AddAuditEmission()` in their `Program.cs`, annotating every state-changing endpoint with `AuditActionAttribute` so that every POST/PUT/PATCH/DELETE emits a structured audit event to the Timeline unified sink.
- **Deprecate per-service audit tables** in Authority, Policy, Notify, Scheduler, Attestor, and JobEngine/ReleaseOrchestrator through a phased dual-write -> read-migration -> drop pipeline.
- This sprint implements AUDIT-002 and AUDIT-005 from `SPRINT_20260408_004_Timeline_unified_audit_sink.md`.
- Working directory: `src/__Libraries/StellaOps.Audit.Emission/`, cross-module endpoint files, per-service persistence directories.
- Expected evidence: all state-changing endpoints decorated, audit events visible in Timeline `/api/v1/audit/events`, dual-write verified, deprecation headers on legacy endpoints, zero data loss.
## Dependencies & Concurrency
- **Upstream**: AUDIT-001 (DONE) -- PostgreSQL persistence for Timeline audit ingest is complete. `PostgresUnifiedAuditEventStore` with SHA-256 hash chain is operational.
- **Upstream**: `AddAuditEmission()` is already called in 9 services: Authority, Policy, Release-Orchestrator, EvidenceLocker, Notify, Scanner, Scheduler, Integrations, Platform. No DI wiring needed.
- Batches 1-2 (filter annotation) can run in parallel across services.
- Batch 3 (dual-write) can begin once Batch 1-2 is verified for a given service.
- Batches 4-5 (read migration, table drop) are sequential and must wait for verification periods.
## Documentation Prerequisites
- `src/__Libraries/StellaOps.Audit.Emission/AuditActionFilter.cs` -- filter behavior, no-op when attribute missing.
- `src/__Libraries/StellaOps.Audit.Emission/AuditActionAttribute.cs` -- module/action/resourceType parameters.
- `docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md` -- parent sprint context.
---
## Part 1: Endpoint Filter Annotation Plan
### Convention Mode Assessment
**The `AuditActionFilter` supports a passive convention mode.** Reading the filter source:
- If `AuditActionAttribute` metadata is NOT present on the endpoint, the filter is a **no-op passthrough** (line 48: returns `result` unchanged).
- The filter can be added at the **RouteGroup level** (ASP.NET Core supports `group.AddEndpointFilter<T>()`), which applies it to all endpoints in the group.
- Only endpoints explicitly annotated with `.WithMetadata(new AuditActionAttribute("module", "action"))` will emit events.
**Recommended approach: hybrid group + per-endpoint annotation.**
1. Add `group.AddEndpointFilter<AuditActionFilter>()` once at each service's main API route group.
2. Add `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints.
3. GET endpoints remain unannotated and the filter passes through silently.
This minimizes the per-endpoint boilerplate (no `.AddEndpointFilter<AuditActionFilter>()` on each endpoint) while keeping explicit control over which actions are audited.
### Per-Service Endpoint Inventory
#### 1. Scanner (module: "scanner") -- 30 endpoint files, ~65 state-changing endpoints
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Sources CRUD | 8 | create, update, delete, test, pause, resume, activate, trigger_scan |
| Scan submission | 2 | submit, attach_entropy |
| SBOM submission/upload | 2 | submit_sbom, upload |
| Scan policy CRUD | 3 | create, update, delete |
| Approvals | 2 | create, revoke |
| Triage (status, VEX, batch, proof) | 5 | update_status, submit_vex, batch_action, generate_proof, bulk_query |
| Webhooks (generic + provider-specific) | 5 | receive_webhook |
| Reports | 1 | create |
| Reachability (compute, analyze, VEX) | 3 | compute, analyze, generate_vex |
| Secret detection settings | 5 | create, update, delete (settings + exceptions) |
| SmartDiff/VEX candidates review | 2 | review |
| Score replay/verify | 4 | replay, verify |
| Validation/fidelity | 3 | validate, analyze, upgrade |
| Offline kit | 2 | import, validate |
| Call graph | 1 | submit |
| Witness verify | 1 | verify |
| Runtime events | 2 | events, reconcile |
| Other (delta compare, EPSS batch, counterfactual, slice, replay attach, GitHub SARIF, policy diagnostics/preview/runtime/overlay/linksets, composition verify) | ~14 | various |
#### 2. Integrations (module: "integrations") -- 1 endpoint file, 6 state-changing endpoints
| Endpoint | Action |
|---|---|
| `POST /` | create |
| `PUT /{id}` | update |
| `DELETE /{id}` | delete |
| `POST /{id}/test` | test |
| `POST /{id}/discover` | discover |
| `POST /ai-code-guard/run` | run_code_guard |
#### 3. Platform (module: "platform") -- 23 endpoint files, ~107 state-changing endpoints
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Setup wizard sessions/steps | 14 | create_session, resume, execute_step, skip_step, run_checks, prerequisites, update_config, finalize |
| Trust signing (keys, issuers, certs, transparency log) | 10 | create_key, rotate_key, revoke_key, create_issuer, block_issuer, unblock_issuer, create_cert, revoke_cert, update_transparency_log |
| Identity providers | 7 | create, update, delete, enable, disable, test, apply |
| Environment settings admin | 2 | update, delete |
| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility |
| Release control (bundles, versions, materialize) | 3 | create_bundle, create_version, materialize |
| Release orchestrator environments (env CRUD, targets, freeze windows) | 12 | create, update, delete (env/target/freeze_window), update_settings, health_check |
| Function maps | 3 | create, delete, verify |
| Localization | 2 | update_bundles, delete_string |
| Crypto provider admin | 2 | update_preferences, delete_preferences |
| Context | 1 | update_preferences |
| Assistant (user state, tips, tours, glossary) | 5 | update_user_state, create_tip, delete_tip, create_tour, create_glossary |
| Federation telemetry | 3 | grant_consent, revoke_consent, trigger |
| Notify compatibility | 13 | create/delete (schedules, quiet_hours, throttle, escalation, localizations), simulate, ack_incident |
| Signals compatibility | 5 | create_trigger, update_trigger, delete_trigger, toggle_trigger, retry |
| Evidence threads | 3 | export, transcript, collect |
| Score | 2 | evaluate, verify |
| Policy interop | 4 | export, import, validate, evaluate |
| Quota/AoC compatibility, onboarding, profiles, seed | ~10 | various |
| Migration admin | 1 | run |
#### 4. Authority (module: "authority") -- 10 endpoint files, ~49 state-changing endpoints
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Tenant CRUD + suspend/resume | 4 | create, update, suspend, resume |
| User CRUD + enable/disable | 4 | create, update, disable, enable |
| Role CRUD + preview impact | 3 | create, update, preview_impact |
| Client CRUD + rotate | 3 | create, update, rotate |
| Token revoke | 1 | revoke |
| Branding update/preview | 2 | update, preview |
| Airgap audit record | 1 | record |
| Bootstrap users/clients/invites/service-accounts/signing/notifications/plugins | 8 | bootstrap_create, rotate, reload |
| OpenIddict (token, introspect, revoke) | 3 | issue_token, introspect, revoke_token |
| Authorize | 1 | authorize |
| IssuerDirectory (issuer CRUD, key CRUD, trust CRUD) | 8 | create, update, delete (issuer/key/trust) |
| Notify ack-tokens + vuln workflow tokens + attachment tokens | 6 | rotate, issue, verify |
| Vulnerability tickets + advisory AI logs | 2 | create_ticket, log_inference |
| Console token introspect + vuln ticket | 2 | introspect, create_ticket |
#### 5. Policy (module: "policy") -- 57 endpoint files in Engine + 11 in Gateway, ~162+56 state-changing endpoints (many duplicated between Engine and Gateway)
**Note**: Policy Engine and Policy Gateway share nearly identical endpoint files (Gateway proxies to Engine). Annotation should target the Engine endpoints; Gateway endpoints should mirror the same attributes.
| Endpoint Group (Engine) | Count | Action(s) |
|---|---|---|
| Governance CRUD (policies, rules, thresholds) | 9 | create, update, delete, enable, disable, reorder, import, export, clone |
| Policy simulation (create, cancel, retry, preview, compare, what-if, etc.) | 20 | create, cancel, retry, preview, compare, simulate |
| Exception management (create, approve, reject, revoke, extend) | 6 | create, approve, reject, revoke, extend, batch |
| Exception approvals (approve, reject, escalate, delegate) | 4 | approve, reject, escalate, delegate |
| Gate operations (evaluate, force-pass) | 2 | evaluate, force_pass |
| Gates CRUD | 2 | create, delete |
| Score gate (evaluate, verify) | 2 | evaluate, verify |
| Risk profile CRUD + air-gap sync | 9 | create, update, delete, sync_airgap, import, export |
| Risk budget management | 3 | create, update, delete |
| Risk simulation (run, preview, batch, sensitivity, compare, rebase, budget) | 7 | run, preview, batch, compare |
| Policy pack CRUD | 5 | create, update, delete, activate, deactivate |
| Policy pack bundles | 1 | create |
| Override CRUD | 5 | create, update, delete, expire, batch |
| Verification policy CRUD + editor | 6 | create, update, delete, compile, validate |
| Scope attachment | 4 | attach, detach, reorder, bulk |
| Snapshots (create, restore) | 2 | create, restore |
| Violations (acknowledge, dismiss, reopen) | 5 | acknowledge, dismiss, reopen |
| Staleness (configure, reset) | 2 | configure, reset |
| Sealed mode (enable, disable, emergency) | 3 | enable, disable, emergency |
| Profile events (create, ack) | 2 | create, acknowledge |
| Conflict resolution | 3 | resolve, merge, override |
| Policy decision, batch evaluation, policy compilation, lint | 4 | evaluate, batch, compile, lint |
| Registry webhooks | 3 | register, update, delete |
| Deltas | 2 | compute, compare |
| Attestation reports + console | 6 | create, export, verify |
| CVSS receipts | 2 | submit, verify |
| Budget endpoints | 1 | allocate |
| Determinization config | 2 | update, audit |
| Other (tool lattice, advisory AI knobs, trust weighting, overlay sim, path scope sim, evidence summary, delta-if-present, air-gap notifications, profile export, console export, ledger export, orchestrator job, policy worker, console simulation, batch context, verify determinism, unknown tracking) | ~20 | various |
#### 6. Release-Orchestrator (module: "release-orchestrator") -- 9 endpoint files, ~40 state-changing endpoints (excluding legacy stubs)
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Release CRUD | 4 | create, update, delete, clone |
| Release lifecycle (ready, promote, deploy, rollback) | 4 | mark_ready, promote, deploy, rollback |
| Release components CRUD | 3 | add, update, remove |
| Approvals (approve, reject, batch) | 4 | approve, reject, batch_approve, batch_reject |
| Release dashboard (approve/reject promotion) | 2 | approve_promotion, reject_promotion |
| Deployment operations (create, pause, resume, cancel, rollback, retry target) | 6 | create, pause, resume, cancel, rollback, retry |
| Release control v2 (approval decision, rollback) | 2 | approval_decision, rollback |
| Scripts CRUD + validate + compatibility | 5 | create, update, delete, validate, check_compatibility |
| Policy gate profiles CRUD + simulate | 9 | create, update, delete, set_default, validate, simulate, bundle_simulate, feed_freshness |
| Evidence verify | 1 | verify |
**Note**: `JobEngineLegacyEndpoints` contains catch-all `{**rest}` stubs that return 501; these do NOT need audit annotation.
#### 7. EvidenceLocker (module: "evidence") -- 2 endpoint files + Program.cs, ~7 state-changing endpoints
| Endpoint | Action |
|---|---|
| `POST /evidence` | store |
| `POST /evidence/snapshot` | snapshot |
| `POST /evidence/verify` | verify |
| `POST /evidence/hold/{caseId}` | hold |
| `POST /verdicts/` | store_verdict |
| `POST /verdicts/{id}/verify` | verify_verdict |
| `POST /exports/{bundleId}/export` | export |
#### 8. Notify (module: "notify") -- 15 endpoint files, ~65 state-changing endpoints
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Rules CRUD (notify API + standalone) | 6 | create, update, delete |
| Templates CRUD + preview + validate (notify API + standalone) | 8 | create, update, delete, preview, validate |
| Incidents (ack, resolve) | 4 | acknowledge, resolve |
| Escalation policies CRUD + schedules CRUD + overrides | 10 | create, update, delete |
| Escalation operations (start, escalate, stop, ack, webhook) | 5 | start, escalate, stop, ack |
| Quiet hours (calendars CRUD + evaluate) | 4 | create, update, delete, evaluate |
| Throttle (config update/delete, evaluate) | 3 | update, delete, evaluate |
| Storm breaker (summary, clear) | 2 | summary, clear |
| Fallback chains + deliveries | 3 | update_chain, test, delete_delivery |
| Localization (format string, update bundles, delete bundle, validate) | 4 | format, update_bundles, delete_bundle, validate |
| Observability (dead letters retry/discard/purge, chaos, retention policies) | 8 | retry, discard, purge, start_experiment, stop_experiment, create_policy, update_policy, delete_policy |
| Security (tokens, keys, webhooks, HTML, tenants, grants) | 12 | sign, verify, rotate, register_webhook, validate, sanitize, strip, validate_tenant, fuzz_test, grant, revoke |
| Operator overrides (create, revoke, check) | 3 | create, revoke, check |
| Simulation (simulate, validate rule) | 2 | simulate, validate |
#### 9. Scheduler (module: "scheduler") -- 9 endpoint files, ~31 state-changing endpoints
| Endpoint Group | Count | Action(s) |
|---|---|---|
| Schedules CRUD + pause/resume | 5 | create, update, delete, pause, resume |
| Runs (create, cancel, retry, preview) | 4 | create, cancel, retry, preview |
| Workflow trigger | 1 | trigger |
| Graph jobs (build, overlay, complete hook) | 3 | build, overlay, complete |
| Event webhooks (conselier export, excitor export) | 2 | export |
| Policy runs | 1 | create |
| Policy simulations (create, preview, cancel, retry) | 4 | create, preview, cancel, retry |
| Resolver jobs | 1 | create |
| PacksRegistry (upload, signature, attestation, lifecycle, parity, offline seed, mirrors, mirror sync) | 9 | upload, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror |
### Total Endpoint Count Summary
| Service | State-Changing Endpoints | Complexity |
|---|---|---|
| Scanner | ~65 | High (30 files) |
| Integrations | 6 | Low (1 file) |
| Platform | ~107 | High (23 files) |
| Authority | ~49 | Medium (10 files, multiple sub-services) |
| Policy | ~162 (Engine) | Very High (57 files, duplicated in Gateway) |
| Release-Orchestrator | ~40 | Medium (9 files) |
| EvidenceLocker | 7 | Low (3 files) |
| Notify | ~65 | High (15 files) |
| Scheduler | ~31 | Medium (9 files) |
| Attestor | ~25 | Medium (FILTER-010) |
| Findings Ledger | ~30 | Medium (FILTER-010) |
| Doctor | ~7 | Low (FILTER-010) |
| Signals | ~10 | Low (FILTER-010) |
| AdvisoryAI/OpsMemory | ~5 | Low (FILTER-010) |
| RiskEngine | ~3 | Low (FILTER-010) |
| Decision Capsules | ~5 | Low (CAPSULE-001, BLOCKED) |
| **TOTAL** | **~617** | |
---
## Part 2: Per-Service Audit Table Deprecation Plan
### 2.1 Authority -- `authority.audit`, `authority.airgap_audit`, `authority.offline_kit_audit`
**Writes:**
- `AuthorityAuditSink` (implements `IAuthEventSink`) writes login/auth events via `IAuthorityLoginAttemptStore.InsertAsync()` -- this is a specialized auth event pipeline, NOT a generic endpoint audit filter.
- `AirgapAuditEndpointExtensions` has `POST /authority/audit/airgap` that records airgap-specific audit entries.
**Reads:**
- `GET /console/admin/audit` -- `ConsoleAdminEndpointExtensions.ListAuditEvents()` reads from the authority.audit table.
- `GET /authority/audit/airgap` -- reads airgap audit entries.
- `GET /authority/incident-audit` -- reads incident audit entries.
- UI: Audit tab in Authority admin console.
**What breaks if dropped:** Admin audit log in the console loses historical auth event data. The specialized `ClassifiedString` PII classification would be lost.
**Dual-write path:** The `AuthorityAuditSink` pipeline is fundamentally different from `AuditActionFilter` (it captures auth protocol events like login success/failure, token issuance, not HTTP endpoint calls). **Both are needed**:
- `AuditActionFilter` for admin mutations (user CRUD, role CRUD, client CRUD, tenant management).
- `AuthorityAuditSink` for auth protocol events (login attempts, token grants, lockouts) -- should also emit to Timeline via `IAuditEventEmitter` directly.
**Migration:** Phase 1: Add `AuditActionFilter` to admin endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` call inside `AuthorityAuditSink.WriteAsync()` to dual-write auth events. Phase 3: Redirect admin audit reads to Timeline. Phase 4: Drop local tables after 90-day verification.
### 2.2 Policy -- `policy.audit` + `policy.gate_bypass_audit`
**Writes:**
- `PolicyAuditRepository.CreateAsync()` writes generic policy audit entries.
- `PostgresGateBypassAuditRepository.AddAsync()` writes gate bypass decisions (specialized: actor, decision override, justification, image digest, policy ID, attestation digest).
- `GateBypassAuditor` service calls the bypass audit repository when a gate bypass occurs.
**Reads:**
- `GET /api/v1/governance/audit/events` + `GET /api/v1/governance/audit/events/{eventId}` -- governance audit events.
- `GET /api/v1/policy/exceptions/{requestId}/audit` -- exception approval trail.
- `GET /api/v1/policy/determinization/audit` -- determinization config audit history.
- `GET /api/v1/policy/simulation/.../audit` -- simulation audit.
- `PolicyAuditRepository.ListAsync()`, `.GetByResourceAsync()`, `.GetByCorrelationIdAsync()`.
- `PostgresGateBypassAuditRepository` reads: `GetByIdAsync`, `GetByDecisionIdAsync`, `GetByActorAsync`, `GetByImageDigestAsync`, `ListRecentAsync`, `ListByTimeRangeAsync`, `CountByActorSinceAsync`.
**What breaks if dropped:** Governance audit UI, exception audit trail, gate bypass forensics (security-critical: who overrode a blocked image?).
**Dual-write path:** Gate bypass audit is domain-specific and has unique query patterns (by image digest, by decision ID, by actor count since a time). These queries cannot be efficiently served from the generic unified audit store without custom indexes. **Recommendation**: Keep `policy.gate_bypass_audit` as a domain table (it is evidence, not just audit), but dual-write all entries to Timeline for cross-service visibility. Generic `policy.audit` can be fully migrated to Timeline.
**Migration:** Phase 1: Add `AuditActionFilter` to all policy engine endpoints. Phase 2: Add Timeline emission in `PolicyAuditRepository.CreateAsync()`. Phase 3: Redirect generic audit reads to Timeline, keep bypass audit reads local. Phase 4: Drop `policy.audit` table. Retain `policy.gate_bypass_audit` permanently (reclassify as domain evidence, not audit).
### 2.3 Notify -- `notify.audit`
**Writes:**
- `NotifyAuditRepository` writes audit entries for template changes, rule changes, and incident acknowledgements.
- Direct calls from endpoint handlers: `TemplateEndpoints`, `RuleEndpoints`, `NotifyApiEndpoints`, `IncidentEndpoints`.
**Reads:**
- `GET /api/v1/notify/audit` (in `Program.cs` line 1329) -- lists audit entries with limit/offset.
**What breaks if dropped:** Notify audit endpoint returns empty or 404.
**Dual-write path:** Notify audit is straightforward CRUD audit (who changed which template/rule). Fully replaceable by `AuditActionFilter` emission. The local `NotifyAuditRepository` writes can be preserved as dual-write during transition.
**Migration:** Phase 1: Add `AuditActionFilter` to all notify endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `NotifyAuditRepository.CreateAsync()` for dual-write. Phase 3: Point `/api/v1/notify/audit` reads to Timeline (proxy or redirect). Phase 4: Drop `notify.audit` table.
### 2.4 Scheduler -- `scheduler.audit` (monthly partitioned)
**Writes:**
- `ISchedulerAuditService` interface writes audit entries when schedules are created/updated/deleted.
- Called from `ScheduleEndpoints` and `RunEndpoints`.
**Reads:**
- Per-schedule and per-run audit queries via `ISchedulerAuditService`.
- No dedicated public audit endpoint found (consumed internally).
**What breaks if dropped:** Internal schedule change audit trail lost.
**Dual-write path:** Scheduler audit is straightforward. The monthly partitioning is its most advanced feature (enables efficient retention via `DROP PARTITION`). The unified Timeline store should adopt partitioning too (noted in AUDIT-004 risks). For now, dual-write is safe.
**Migration:** Phase 1: Add `AuditActionFilter` to scheduler endpoints. Phase 2: Dual-write via `IAuditEventEmitter.EmitAsync()` in `ISchedulerAuditService` implementation. Phase 3: Drop `scheduler.audit` partitions after Timeline verification. Phase 4: Remove partition maintenance background service.
### 2.5 Attestor -- `proofchain.audit_log`
**Writes:**
- EF Core entity `AuditLogEntity` mapped to `proofchain.audit_log`. Records operations (create/verify/revoke) on proof chain entities.
**Reads:**
- Internal only (no public audit endpoint found).
**What breaks if dropped:** Proof chain operation audit trail lost. However, the proof chain itself provides cryptographic evidence of operations.
**Dual-write path:** Attestor audit is simple operation logging. Fully replaceable by `AuditActionFilter` if Attestor endpoints are wired.
**Note:** Attestor is NOT in the 9 services that currently call `AddAuditEmission()`. It needs to be wired first.
**Migration:** Phase 1: Wire `AddAuditEmission()` in Attestor `Program.cs` + add `AuditActionFilter`. Phase 2: Dual-write via emitter in audit log write path. Phase 3: Drop `proofchain.audit_log` table.
### 2.6 JobEngine/ReleaseOrchestrator -- `audit_entries` + `audit_sequences` (hash chain)
**Writes:**
- `PostgresAuditRepository.AppendAsync()` in both JobEngine and ReleaseOrchestrator. Uses raw SQL with transactional hash chaining: get sequence -> compute hash -> insert entry -> update sequence hash.
- `CanonicalJsonHasher` for deterministic content hashing.
- Called from service layers when releases, deployments, approvals, etc. are modified.
**Reads (ReleaseOrchestrator):**
- `GET /api/v1/release-orchestrator/audit` -- list, get by ID, resource history, sequence range, latest, summary, verify chain.
- Full REST API with cursor pagination, event type filtering, resource filtering, time range, actor filtering.
- Chain verification endpoint (`VerifyAuditChain`) for tamper-evidence.
**Reads (JobEngine):**
- `PostgresAuditRepository.ListAsync()`, `.GetByIdAsync()`, `.GetByResourceAsync()`, `.GetBySequenceRangeAsync()`, `.GetLatestAsync()`, `.GetCountAsync()`, `.VerifyChainAsync()`, `.GetSummaryAsync()`.
- PacksRegistry: `IAuditRepository` used by `PackService`, `AttestationService`, `LifecycleService`, `ParityService`, `MirrorService`, `ExportService`.
**What breaks if dropped:** The most mature audit implementation in the system. REST API endpoints return 404/500. Chain verification capability lost. PacksRegistry audit trail lost.
**Dual-write path:** This is the most complex case because:
1. The local hash chain provides per-service tamper evidence.
2. The Timeline unified store has its OWN hash chain (separate sequence).
3. Both chains serve different purposes: local chain proves service-level integrity; unified chain proves cross-service integrity.
**Recommendation:** Keep the ReleaseOrchestrator/JobEngine hash chain as the **service-level evidence chain** (reclassify as domain evidence, like the Policy gate bypass audit). Dual-write all entries to Timeline for the unified cross-service view. Eventually redirect LIST/SEARCH reads to Timeline but preserve the local chain verification endpoint.
**Migration:** Phase 1: Add `AuditActionFilter` to all release-orchestrator and scheduler endpoints. Phase 2: Add `IAuditEventEmitter.EmitAsync()` in `PostgresAuditRepository.AppendAsync()` for dual-write. Phase 3: Redirect list/search/summary reads to Timeline (keep chain verify local). Phase 4: Evaluate whether local chain can be removed after 180-day parallel run. Phase 5: If chain integrity data is replicated in Timeline's own chain, drop local tables.
---
## Delivery Tracker
### FILTER-001 - Convention helper: `AuditedRouteGroupExtensions`
Status: DONE
Dependency: none
Owners: Developer (backend)
Task description:
- Create a small extension method in `StellaOps.Audit.Emission` that applies the filter at the group level:
```csharp
public static RouteGroupBuilder WithAuditFilter(this RouteGroupBuilder group)
{
group.AddEndpointFilter<AuditActionFilter>();
return group;
}
```
- This reduces per-file boilerplate: each endpoint file calls `.WithMetadata(new AuditActionAttribute("module", "action"))` only on state-changing endpoints, while the group registers the filter once.
- Also create a convenience extension for the common case:
```csharp
public static RouteHandlerBuilder Audited(this RouteHandlerBuilder builder, string module, string action, string? resourceType = null)
{
return builder
.AddEndpointFilter<AuditActionFilter>()
.WithMetadata(new AuditActionAttribute(module, action) { ResourceType = resourceType });
}
```
- The group-level approach is preferred for services with a single root group. The per-endpoint `.Audited()` method is a fallback for services with multiple independent groups.
Completion criteria:
- [x] Extension methods added to `StellaOps.Audit.Emission` (`AuditedRouteGroupExtensions.cs`)
- [x] `WithAuditFilter()` and `Audited()` convenience methods implemented
- [x] Builds with no errors
**Effort: 0.5 day**
### FILTER-002 - Batch 1: Annotate simple services (Integrations, EvidenceLocker)
Status: DONE
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- **Integrations** (6 endpoints, 1 file): Add `.WithAuditFilter()` on the group. Add `.WithMetadata(new AuditActionAttribute("integrations", "<action>"))` on each of the 6 state-changing endpoints: create, update, delete, test, discover, run_code_guard.
- **EvidenceLocker** (7 endpoints, 3 files): Add filter to endpoint groups. Annotate: store, snapshot, verify, hold, store_verdict, verify_verdict, export.
- Test: start services, trigger each endpoint, verify events appear in Timeline `/api/v1/audit/events?modules=integrations,evidence`.
Completion criteria:
- [x] All 13 endpoints annotated (EvidenceLocker: 7, Integrations: 6)
- [ ] Events visible in Timeline for both modules (requires runtime verification)
- [x] No startup regressions (builds clean, 0 errors)
**Effort: 1 day**
### FILTER-003 - Batch 1 continued: Annotate Scanner
Status: DONE
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- Scanner has ~65 state-changing endpoints across 30 files.
- Add `.WithAuditFilter()` on the top-level `MapGroup` in each endpoint registration extension method.
- Annotate each POST/PUT/PATCH/DELETE with `AuditActionAttribute("scanner", "<action>")`.
- Action naming convention: use verb form matching the endpoint purpose (create, update, delete, submit, trigger, compute, verify, import, export, review, replay, etc.).
- Resource type overrides: use explicit `ResourceType` for non-obvious resources (e.g., `ResourceType = "scan_policy"` for scan policy CRUD, `ResourceType = "source"` for sources CRUD).
- Focus on CRUD and business operations; skip purely computational/query-like POSTs where the endpoint is idempotent and read-only (e.g., `/compare`, `/query`, `/current` batch).
**Endpoints to SKIP** (read-only POST patterns, no state change):
- `DeltaCompareEndpoints.HandleCompareAsync` (computation)
- `CounterfactualEndpoints.HandleComputeAsync` (computation)
- `EpssEndpoints.GetCurrentBatch` (batch read)
- `SliceEndpoints.HandleQueryAsync` (query)
- `ScoreReplayEndpoints` (replay verification, read-only)
- `PolicyEndpoints` diagnostics/preview/runtime/overlay/linksets (read-only analysis)
**Endpoints to ANNOTATE** (~50 after filtering):
- Sources CRUD + lifecycle operations
- Scan/SBOM submission
- Scan policy CRUD
- Approvals create/revoke
- Triage status updates, VEX submissions
- Secret detection settings CRUD
- SmartDiff VEX candidate reviews
- Webhooks (state-changing: trigger scans)
- Reports, offline kit import, call graph submit, witness verify
- Runtime events/reconcile, reachability compute
Completion criteria:
- [x] ~50 endpoints annotated across 20 endpoint files (skipped: DeltaCompare, Counterfactual, EPSS batch, Slice query/replay, PolicyEndpoints diagnostics/preview/runtime/overlay/linksets)
- [ ] Events visible in Timeline for module=scanner (requires runtime verification)
- [x] No startup regressions (builds clean, 0 errors)
**Effort: 2 days**
### FILTER-004 - Batch 2: Annotate Platform
Status: DONE (commit 54e7f871a)
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- Platform has ~107 state-changing endpoints across 23 files.
- Apply group-level filter on each endpoint group.
- Annotate with `AuditActionAttribute("platform", "<action>")`.
- Use descriptive resource types: `identity_provider`, `trust_key`, `trust_issuer`, `trust_cert`, `script`, `environment`, `freeze_window`, `target`, `release_bundle`, `function_map`, `setup_session`, `localization`, `crypto_preference`, `environment_setting`, etc.
- Skip read-only POSTs: score evaluate/verify (computational), AoC compatibility verify/validate (read-only checks), notify/signals/quota compatibility stubs that are proxied responses.
- Pay special attention to `SetupEndpoints` (wizard steps) -- these are high-value audit targets (initial system configuration).
Completion criteria:
- [ ] ~90 endpoints annotated (with documented skip list)
- [ ] Events visible in Timeline for module=platform
- [ ] No startup regressions
**Effort: 2.5 days**
### FILTER-005 - Batch 2 continued: Annotate Authority
Status: DONE (commit d4d75200c)
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- Authority has ~49 state-changing endpoints across 10 files plus Program.cs inline endpoints.
- **Special consideration**: Authority runs its own auth middleware, not the standard gateway-propagated identity. The `AuditActionFilter` must correctly extract actor from Authority's own `ClaimsPrincipal`.
- Apply filter to admin group, console group, bootstrap group, and issuer directory groups.
- Action mapping for admin operations: tenant (create, update, suspend, resume), user (create, update, enable, disable), role (create, update, preview_impact), client (create, update, rotate), token (revoke).
- Action mapping for bootstrap: bootstrap_user, bootstrap_client, bootstrap_invite, revoke_service_account, rotate_signing, rotate_notifications, reload_plugins.
- Action mapping for issuer directory: create_issuer, update_issuer, delete_issuer, create_key, rotate_key, revoke_key, set_trust, delete_trust.
- Skip: OpenIddict protocol endpoints (token, introspect, revoke) -- these are auth protocol operations already captured by `AuthorityAuditSink`, not admin mutations. Authorize endpoint similarly.
- Skip: Notify ack-token endpoints, vuln workflow anti-forgery endpoints (internal crypto operations, not user-facing mutations).
Completion criteria:
- [ ] ~35 admin/bootstrap/issuer endpoints annotated
- [ ] Events visible in Timeline for module=authority
- [ ] AuthorityAuditSink continues to work independently (no interference)
**Effort: 2 days**
### FILTER-006 - Batch 2 continued: Annotate Notify
Status: DONE (commit 54e7f871a)
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- Notify has ~65 state-changing endpoints across 15 files.
- Group-level filter on each endpoint group.
- Module name: "notify".
- Action mapping: rules (create, update, delete), templates (create, update, delete, preview, validate), incidents (acknowledge, resolve), escalation (create/update/delete policy, create/update/delete schedule, create/delete override, start, escalate, stop), quiet_hours, throttle, storm, fallback, localization, security, operator_override, simulation, observability.
- Skip: `POST /tokens/sign`, `POST /tokens/verify`, `POST /html/sanitize`, `POST /html/validate`, `POST /html/strip` -- these are utility/computation endpoints that do not mutate state.
- Focus on: CRUD operations, incident lifecycle, escalation lifecycle, dead letter management, chaos experiments, retention policies.
Completion criteria:
- [ ] ~50 endpoints annotated (with documented skip list)
- [ ] Events visible in Timeline for module=notify
- [ ] No conflict with existing `NotifyAuditRepository` writes
**Effort: 2 days**
### FILTER-007 - Batch 2 continued: Annotate Policy Engine + Gateway
Status: DONE (commit d4d75200c)
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- Policy Engine has ~162 state-changing endpoints across 57 files. Policy Gateway duplicates ~56 of these.
- **Strategy**: Annotate Engine endpoints. For Gateway, apply the same attributes to the matching Gateway endpoint files.
- Module name: "policy".
- The Gateway files under `src/Policy/StellaOps.Policy.Gateway/Endpoints/` mirror the Engine's `src/Policy/StellaOps.Policy.Engine/Endpoints/Gateway/` directory. Both need annotation.
- High-priority groups (security-critical):
1. Gate endpoints (evaluate, force-pass) -- action: evaluate_gate, force_pass_gate
2. Exception approvals (approve, reject, escalate, delegate) -- action: approve_exception, reject_exception, escalate_exception, delegate_exception
3. Governance CRUD -- action: create_governance, update_governance, delete_governance
4. Sealed mode (enable, disable, emergency) -- action: enable_sealed, disable_sealed, emergency_unseal
5. Override CRUD -- action: create_override, expire_override
- Lower-priority (operational):
6. Simulation endpoints (create, cancel, retry, preview)
7. Risk profile/budget CRUD
8. Verification policy CRUD
9. Snapshot create/restore
10. Compilation, lint, attestation reports
- Skip: Batch evaluation, policy decision, score gate evaluate (read-only evaluations that return computed results without mutating state).
Completion criteria:
- [ ] ~130 endpoints annotated across Engine and Gateway (with documented skip list)
- [ ] Events visible in Timeline for module=policy
- [ ] No conflict with existing `PolicyAuditRepository` writes
**Effort: 4 days**
### FILTER-008 - Batch 2 continued: Annotate Release-Orchestrator + Scheduler
Status: DONE (commit 54e7f871a)
Dependency: FILTER-001
Owners: Developer (backend)
Task description:
- **Release-Orchestrator** (~40 endpoints, 9 files): Module "release-orchestrator". High-value actions: create_release, promote, deploy, rollback, approve, reject. Skip: legacy stubs (`JobEngineLegacyEndpoints` returning 501).
- **Scheduler** (~31 endpoints, 9 files): Module "scheduler". Actions: create_schedule, update_schedule, delete_schedule, pause, resume, create_run, cancel_run, retry_run, trigger_workflow.
- PacksRegistry (part of Scheduler service): Module "packs-registry". Actions: upload_pack, rotate_signature, upload_attestation, transition_lifecycle, check_parity, seed_export, create_mirror, sync_mirror.
Completion criteria:
- [ ] All ~71 endpoints annotated
- [ ] Events visible in Timeline for modules: release-orchestrator, scheduler, packs-registry
- [ ] No conflict with existing `PostgresAuditRepository` hash chain writes
**Effort: 2 days**
### FILTER-010 - Annotate endpoints in newly-wired services (Attestor, Findings, Doctor, Signals, AdvisoryAI, RiskEngine)
Status: DONE (commit 665bd6db4)
Dependency: FILTER-001 (convention helper)
Owners: Developer (backend)
Task description:
- These 6 services were recently wired into the Valkey transport and have state-changing endpoints that need audit annotation.
- Apply the same group-level filter + per-endpoint metadata convention as the original 9 services.
Services and their state-changing endpoints to annotate:
**Attestor (HIGH priority):** ~25 endpoints
- attestor / sign_dsse, verify_dsse, add_key, revoke_key, rotate_key
- attestor / create_ceremony, approve_ceremony, execute_ceremony, cancel_ceremony
- attestor / create_watchlist_entry, update_watchlist_entry, delete_watchlist_entry
- attestor / export_attestation, import_attestation, sign_attestation, submit_rekor_entry
**Findings Ledger (HIGH priority):** ~30 endpoints
- findings / create_vex_decision, update_vex_decision, create_fix_verification
- findings / create_audit_bundle, create_ledger_event, create_alert_decision
- findings / create_attestation_pointer, transition_finding_state, create_vex_issuer
**Doctor:** ~7 endpoints
- doctor / start_run, diagnose, delete_report, create_schedule, update_schedule, delete_schedule, execute_schedule
**Signals:** ~10 endpoints
- signals / ingest_callgraph, ingest_runtime_fact, compute_reachability, submit_execution_evidence, register_beacon
**AdvisoryAI/OpsMemory:** ~5 endpoints
- advisory-ai / record_decision, record_outcome, create_run
**RiskEngine:** ~3 endpoints
- riskengine / create_score_job, create_simulation
Completion criteria:
- [ ] All ~80 endpoints annotated across 6 services
- [ ] Events visible in Timeline for modules: attestor, findings, doctor, signals, advisory-ai, riskengine
- [ ] No startup regressions
**Effort: 3 days**
### CAPSULE-001 - Decision Capsule lifecycle audit events
Status: BLOCKED (capsule sealing pipeline not yet implemented)
Dependency: capsule pipeline implementation
Owners: Developer (backend)
Task description:
- Once the Decision Capsule sealing pipeline is built, add audit events for:
- evidence / create_capsule, seal_capsule, verify_capsule, export_capsule, replay_capsule
- Decision Capsules are signed, immutable, content-addressed bundles containing SBOM + vuln feeds + reachability evidence + policy version + derived VEX + DSSE signatures. Their lifecycle mutations are security-critical.
- Current state: DB table exists (release.run_capsule_replay_linkage), read model and UI routes exist, but full creation/sealing pipeline is partially aspirational.
Completion criteria:
- [ ] All capsule lifecycle endpoints annotated with AuditActionAttribute
- [ ] Capsule create/seal/verify events visible in Timeline
- [ ] Audit events include content-address hash for traceability
**Effort: 1 day (once capsule pipeline is implemented)**
### DEPRECATE-001 - Batch 3: Dual-write for services with local audit tables
Status: DOING
Dependency: FILTER-002 through FILTER-008 (at least the relevant service batch)
Owners: Developer (backend)
Task description:
- For each service with an existing local audit table, add a secondary write path that emits to Timeline via `IAuditEventEmitter.EmitAsync()` inside the existing audit repository write methods:
1. **Authority**: Add `IAuditEventEmitter.EmitAsync()` in `AuthorityAuditSink.WriteAsync()` to emit auth events (login, token grant, lockout) to Timeline. Map `AuthEventRecord` to `AuditEventPayload`.
2. **Policy**: Add emission in `PolicyAuditRepository.CreateAsync()` and in `GateBypassAuditor` to emit bypass decisions to Timeline.
3. **Notify**: Add emission in `NotifyAuditRepository` create method.
4. **Scheduler**: Add emission in `ISchedulerAuditService` implementation.
5. **JobEngine/ReleaseOrchestrator**: Add emission in `PostgresAuditRepository.AppendAsync()`. Map `AuditEntry` fields to `AuditEventPayload`.
6. **Attestor**: Wire `AddAuditEmission()` in Program.cs (not yet wired). Add emission alongside `AuditLogEntity` inserts.
- All emissions must be fire-and-forget (matching existing `AuditActionFilter` pattern) -- failure to emit to Timeline must never break the local write.
- Add a log warning when emission fails (already built into `HttpAuditEventEmitter`).
Completion criteria:
- [x] Dual-write verified for all 6 services (events appear in both local table and Timeline) — Authority (commit `a947c8df6`), Policy (`a7f3880e9`), Notify (`0acd2ecab`), Scheduler (`7c69058e1`), JobEngine/ReleaseOrchestrator (`2f32c7f0c`); Attestor uses endpoint-level `.Audited()` filter instead of repository-level dual-write (already wired on all endpoints).
- [x] Local audit write latency unchanged (emission is async/fire-and-forget) — all dual-write paths wrap `EmitAsync` in try/catch and log warnings on failure; local write completes before emission starts and emission is not awaited synchronously against the request path.
- [ ] No data loss: local table remains the authoritative source during this phase — requires 30-day production observation period to confirm, still TODO.
**Effort: 3 days (implementation DONE; 30-day verification window opens on production deployment)**
### DEPRECATE-002 - Batch 4: Redirect reads to Timeline unified sink
Status: TODO
Dependency: DEPRECATE-001, 30-day dual-write verification period
Owners: Developer (backend)
Task description:
- After 30 days of verified dual-write with zero data discrepancies:
1. **Authority**: Update `ConsoleAdminEndpointExtensions.ListAuditEvents()` to query Timeline `/api/v1/audit/events?modules=authority` instead of local `authority.audit` table. Add `Obsolete` attribute and deprecation response headers to the local audit endpoint.
2. **Policy**: Update governance audit endpoints to query Timeline. Keep gate bypass audit endpoints reading from local `policy.gate_bypass_audit` (domain evidence, not generic audit).
3. **Notify**: Update `/api/v1/notify/audit` to proxy to Timeline.
4. **Scheduler**: Internal audit reads redirected to Timeline.
5. **ReleaseOrchestrator**: Update `/api/v1/release-orchestrator/audit` LIST/SEARCH/SUMMARY endpoints to query Timeline. **Keep chain verification endpoint reading from local table** (service-level chain integrity is different from unified chain).
6. **Attestor**: Internal audit reads redirected to Timeline.
- Update `HttpUnifiedAuditEventProvider` to stop polling deprecated service-specific audit endpoints.
- Add deprecation headers: `Sunset: <date>`, `Deprecation: true`, `Link: <timeline-url>; rel="successor-version"`.
Completion criteria:
- [ ] All service-specific audit read endpoints return deprecation headers
- [ ] Timeline is the primary read source for all generic audit queries
- [ ] UI `AuditLogClient` uses unified endpoint exclusively (no fallback to per-service)
- [ ] Per-service audit endpoints still functional (backward compatibility for 90 days)
**Effort: 3 days (implementation) + 30-day verification wait**
### DEPRECATE-003 - Batch 5: Drop deprecated local audit tables
Status: TODO
Dependency: DEPRECATE-002, 90-day backward-compatibility period
Owners: Developer (backend)
Task description:
- After 90 days with no clients reading from deprecated endpoints:
1. Remove local audit write code from repositories (stop dual-write).
2. Create SQL migrations to drop tables:
- `DROP TABLE IF EXISTS authority.audit CASCADE;`
- `DROP TABLE IF EXISTS authority.airgap_audit CASCADE;`
- `DROP TABLE IF EXISTS authority.offline_kit_audit CASCADE;`
- `DROP TABLE IF EXISTS policy.audit CASCADE;` (keep `policy.gate_bypass_audit`)
- `DROP TABLE IF EXISTS notify.audit CASCADE;`
- `DROP TABLE IF EXISTS scheduler.audit CASCADE;` (drop all partitions)
- `DROP TABLE IF EXISTS proofchain.audit_log CASCADE;`
3. **Do NOT drop** `audit_entries` / `audit_sequences` in JobEngine/ReleaseOrchestrator yet -- the hash chain is service-level evidence. Reclassify as domain tables, not audit tables. Evaluate for removal in a future sprint after 180-day parallel chain verification between local and Timeline chains.
4. Remove deprecated audit endpoint registrations.
5. Remove `PolicyAuditRepository`, `NotifyAuditRepository`, `AuthorityAuditSink` local DB write paths (keep structured logging).
6. Remove `HttpUnifiedAuditEventProvider` polling entirely (all data flows through emission now).
Completion criteria:
- [ ] Local audit tables dropped (except JobEngine/ReleaseOrchestrator chain tables and Policy gate bypass)
- [ ] No 500 errors from missing tables
- [ ] Timeline is the sole audit data store
- [ ] All audit read endpoints serve data from Timeline
- [ ] Deprecated code removed, no dead references
**Effort: 2 days (implementation) + 90-day wait from DEPRECATE-002**
---
## Effort Summary
| Batch | Tasks | Effort | Timeline |
|---|---|---|---|
| **Batch 1**: Convention helper + simple services (Integrations, EvidenceLocker, Scanner) | FILTER-001, FILTER-002, FILTER-003 | 3.5 days | Week 1 |
| **Batch 2**: Complex services (Platform, Authority, Notify, Policy, ReleaseOrchestrator, Scheduler) | FILTER-004 through FILTER-008 | 12.5 days | Weeks 2-4 |
| **Batch 2b**: Newly-wired services (Attestor, Findings, Doctor, Signals, AdvisoryAI, RiskEngine) | FILTER-010 | 3 days | Week 3-4 |
| **Blocked**: Decision Capsule lifecycle audit | CAPSULE-001 | 1 day (when unblocked) | TBD |
| **Batch 3**: Dual-write transition | DEPRECATE-001 | 3 days | Week 5-6 |
| **Batch 4**: Read migration (after 30-day verification) | DEPRECATE-002 | 3 days + 30-day wait | Week 9-10 |
| **Batch 5**: Drop local tables (after 90-day backward-compat) | DEPRECATE-003 | 2 days + 90-day wait | Week 22-23 |
| **TOTAL** | | **28 days active work** + **120 days verification** | ~6 months end-to-end |
---
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-04-08 | Sprint created. Full endpoint inventory completed across all 9 wired services (~532 state-changing endpoints). Per-service audit table analysis completed for 6 services with local tables. | Planning |
| 2026-04-08 | Added FILTER-010 (6 newly-wired services: ~80 endpoints) and CAPSULE-001 (blocked on capsule pipeline). Added Config/Settings Audit Checklist confirming all mutation surfaces are covered. Total active effort updated to 28 days. | Planning |
| 2026-04-08 | FILTER-001 DONE: Created `AuditedRouteGroupExtensions.cs` with `WithAuditFilter()` and `Audited()` convenience methods. FILTER-002 DONE: Annotated 7 EvidenceLocker + 6 Integrations endpoints. FILTER-003 DONE: Annotated ~50 Scanner endpoints across 20 files (skipped read-only POSTs per convention). All 3 services build clean with 0 errors/warnings. | Developer |
| 2026-04-13 | Status sync: FILTER-004 (Platform), FILTER-006 (Notify), FILTER-008 (ReleaseOrchestrator+Scheduler) confirmed DONE via commit `54e7f871a`. FILTER-005 (Authority), FILTER-007 (Policy+Gateway) confirmed DONE via commit `d4d75200c`. FILTER-010 (Attestor, Findings, Doctor, Signals, AdvisoryAI, RiskEngine) confirmed DONE via commit `665bd6db4`. Additional audit-filter hardening shipped via commits `2a69ad112` (enhanced filter with body capture) and `7f40f8d67` (module catalog, Diff ingest, chain verify fixes). DEPRECATE-001/002/003 remain TODO — they have mandatory 30-day and 90-day verification windows built into the plan and cannot be accelerated. CAPSULE-001 remains BLOCKED on the capsule sealing pipeline. | QA |
| 2026-04-19 | DEPRECATE-001 implementation DONE across 5 services. Repository-level dual-write pattern (fire-and-forget EmitAsync wrapped in try/catch, optional IAuditEventEmitter DI) shipped for: AuthorityAuditSink (commit `a947c8df6`), PolicyAuditRepository (`a7f3880e9`), NotifyAuditRepository (`0acd2ecab`), PostgresSchedulerAuditService (`7c69058e1`), PostgresAuditRepository release-orchestrator (`2f32c7f0c`). Attestor already uses endpoint-level `.Audited()` attribute across all endpoints (pre-existing) so no repository-level dual-write needed. Each dual-write mapper builds a `UnifiedAuditEvent`-compatible payload with actor/resource/details preserved. Local write remains authoritative; Timeline emission is fire-and-forget. Task status flipped TODO → DOING until 30-day production verification confirms no data loss. | Codex |
## Decisions & Risks
### Decisions
1. **Group-level filter + per-endpoint metadata is the convention.** `AuditActionFilter` is a no-op without `AuditActionAttribute`, so applying it at the group level is safe and reduces boilerplate from 2 lines per endpoint to 1 line.
2. **Policy `gate_bypass_audit` and JobEngine/ReleaseOrchestrator `audit_entries` are reclassified as domain evidence tables, not audit.** Their query patterns (by image digest, by decision ID, by chain sequence) and integrity guarantees (hash chains, attestation digests) serve domain-specific needs that the generic unified store cannot efficiently replace. They should persist permanently alongside the unified audit sink.
3. **Read-only POST endpoints are excluded from audit annotation.** Endpoints like `/compare`, `/query`, `/evaluate` (when they compute a result without persisting state) do not produce meaningful audit events. Annotating them would create noise in the audit log.
4. **Authority auth-protocol events require separate emission.** The `AuthorityAuditSink` captures login attempts, token grants, and lockouts -- events that are NOT HTTP endpoint mutations. These must be emitted to Timeline via a direct `IAuditEventEmitter.EmitAsync()` call, not via `AuditActionFilter`.
5. **120-day verification pipeline.** Dual-write runs for 30 days before reads are redirected. Deprecated endpoints remain functional for 90 more days. Total 120 days from dual-write start to table drop. This is non-negotiable for a compliance-critical audit subsystem.
### Config/Settings Audit Checklist
Coverage confirmation for all configuration and settings mutation surfaces:
| Config/Settings Area | Covered By | Status |
|---|---|---|
| Platform env settings | FILTER-004 | Planned |
| Crypto preferences | FILTER-004 | Planned |
| Integration configs | FILTER-002 | Planned |
| Scheduler schedules | FILTER-008 | Planned |
| Notification rules/channels | FILTER-006 | Planned |
| Authority clients/scopes | FILTER-005 | Planned |
| Scanner policies | FILTER-003 | Planned |
| Policy governance | FILTER-007 | Planned |
| Attestor operations | FILTER-010 | Planned (new) |
| Findings decisions | FILTER-010 | Planned (new) |
| Doctor schedules | FILTER-010 | Planned (new) |
| Decision Capsules | CAPSULE-001 | BLOCKED (pipeline not implemented) |
### Risks
1. **~532 endpoints is a large surface.** Risk of missed annotations or incorrect module/action strings. Mitigation: create an integration test that walks all registered endpoints and asserts that every non-GET endpoint has `AuditActionAttribute` metadata (or is in an explicit skip list).
2. **Policy Engine/Gateway duplication.** The same endpoint logic exists in two places. Risk of annotation drift. Mitigation: consider extracting shared endpoint registration into a common library, or generating Gateway endpoints from Engine definitions.
3. **Fire-and-forget emission can silently drop events.** If Timeline is down during the 30-day dual-write period, the local table has events that Timeline does not. Mitigation: add a reconciliation job that compares local table event counts with Timeline for the same module/time range and alerts on discrepancies.
4. **Performance impact of 532 additional HTTP calls.** Each annotated endpoint now makes a fire-and-forget HTTP POST to Timeline. Under high load, this could create back-pressure. Mitigation: `HttpAuditEventEmitter` already uses `IHttpClientFactory` with connection pooling. Add circuit-breaker via Polly if needed. The emission is async and never blocks the response.
5. **Existing Scheduler monthly partitioning is lost in Timeline.** The unified store does not partition by month. Retention will rely on `DELETE WHERE timestamp < cutoff` instead of `DROP PARTITION`. Mitigation: AUDIT-004 (from parent sprint) should add partitioning to the unified audit table.
## Next Checkpoints
- **Week 1**: Convention helper shipped, Integrations + EvidenceLocker + Scanner annotated
- **Week 2-4**: All remaining original 9 services + newly-wired 6 services annotated (FILTER-010)
- **Week 5-6**: Dual-write enabled, monitoring dashboard created
- **Week 10-11**: Read migration after 30-day verification
- **Week 23-24**: Table drop after 90-day backward-compat window
- **TBD**: CAPSULE-001 unblocked when capsule sealing pipeline is implemented