diff --git a/docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md b/docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md index 4ea0f09b0..93129cd03 100644 --- a/docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md +++ b/docs/implplan/SPRINT_20260408_004_Timeline_unified_audit_sink.md @@ -199,7 +199,7 @@ Completion criteria: - [x] Data classification applied to all ingested events — migration 005 adds `data_classification` column with CHECK constraint; `PostgresUnifiedAuditEventStore` populates it at insert time via `AuditDataClassifier` (none|personal|sensitive|restricted ladder with 16 passing tests). - [x] Retention purge runs on schedule without breaking hash chains — `AuditRetentionPurgeService` background host iterates tenants and calls `timeline.purge_expired_audit_events`; the SQL function respects `compliance_hold` and drops expired rows per classification. The hash chain is left intact for non-purged rows; purged rows leave chain-external gaps, which is acceptable because `verify_unified_audit_chain` only asserts contiguous-chain integrity *within* a queried sequence range. - [x] Right-to-erasure redacts PII without invalidating chain verification — `timeline.redact_actor_pii` replaces email/ip/user-agent (plus name for personal/sensitive) with `[REDACTED]`, preserves `actor_id` and `content_hash`; `PostgresUnifiedAuditEventStore.RedactActorPiiAsync` + `DELETE /api/v1/audit/actors/{actorId}/pii` expose the operation under the new `Timeline.Admin` scope. -- [ ] Documentation updated: `docs/modules/timeline/audit-retention.md` — deferred. +- [x] Documentation updated: `docs/modules/timeline/audit-retention.md` — dossier shipped covering classifications, retention table + overrides, scheduled purge config, right-to-erasure contract, chain-gap handling, and the operator compliance checklist. - [ ] Doctor `AuditReadinessCheck` updated to verify retention configuration — deferred. ### AUDIT-005 - Deprecate per-service audit DB tables (Phase 2) diff --git a/docs/modules/timeline/audit-retention.md b/docs/modules/timeline/audit-retention.md new file mode 100644 index 000000000..ee3eb1af9 --- /dev/null +++ b/docs/modules/timeline/audit-retention.md @@ -0,0 +1,162 @@ +# Timeline Unified Audit — Data Classification, Retention, and Right-to-Erasure + +> Sprint: `SPRINT_20260408_004_Timeline_unified_audit_sink` (AUDIT-004). +> Scope: how Timeline classifies, retains, and redacts unified audit events. + +## 1. Classifications + +Every row in `timeline.unified_audit_events` carries a `data_classification` +value drawn from the four-rung ladder below. The narrowest applicable class +wins — the classifier (`StellaOps.Timeline.WebService.Audit.AuditDataClassifier`) +evaluates from most to least restrictive. + +| Classification | When it applies | Examples | +|---|---|---| +| `restricted` | Key-escrow / signing-ceremony / trust-anchor operations in `signer` or `attestor`; any action whose name contains `key_escrow`, `signing_key`, or `rotate_signing_key` regardless of module. | `signer.key_rotate`, `signer.ceremony_open`, `attestor.rekor_submit`, `attestor.trust_anchor_update`, `platform.key_escrow_grant`. | +| `sensitive` | Authority auth-protocol events with subject context — logins, token grants, lockouts, MFA and password reset flows. | `authority.login`, `authority.token_grant`, `authority.lockout`, `authority.mfa_challenge`, `authority.password_reset`. | +| `personal` | Actor PII present: `actor.email`, `actor.ip_address`, or `actor.user_agent`. | `notify.update` with `actor.email` set; `jobengine.execute` from a user with `actor.ip`. | +| `none` | No actor PII and no sensitive/restricted signal. | Pure system events, service-to-service heartbeats. | + +The ladder is deliberately strict: a `signer.key_rotate` event with both a +user email and an IP still classifies as `restricted`, and an +`authority.login` event with `actor.ip` set still classifies as `sensitive` +(not `personal`). + +Classification happens automatically at ingest if the incoming +`AuditEventPayload.DataClassification` is null or whitespace. Producers that +already know the class can set it explicitly to bypass classification. + +## 2. Retention policy + +Retention windows live in `timeline.audit_retention_policies`. The table is +keyed on `(tenant_id, data_classification)`. A row with `tenant_id = '*'` is +the platform default; tenant-specific rows override the default. + +Platform defaults seeded by migration 005: + +| Classification | Retention | +|---|---| +| `none` | 365 days | +| `personal` | 365 days | +| `sensitive` | 730 days | +| `restricted` | 2555 days (~7 years) | + +`timeline.resolve_audit_retention_days(tenant_id, classification)` resolves the +effective window: tenant-specific → platform default → 365-day fallback. + +### Overriding retention for a tenant + +```sql +INSERT INTO timeline.audit_retention_policies (tenant_id, data_classification, retention_days) +VALUES ('acme-prod', 'personal', 180) +ON CONFLICT (tenant_id, data_classification) +DO UPDATE SET retention_days = EXCLUDED.retention_days, updated_at = NOW(); +``` + +Tenant overrides apply immediately to the next purge cycle. Shortening a +retention window will delete rows whose `timestamp` already falls outside the +new window on the first cycle after the change. + +### Legal holds + +Set `compliance_hold = TRUE` on any row that must survive retention-driven +deletion. The purge function filters `compliance_hold = FALSE`, so held rows +never get purged even if their retention window has expired. Use this for +rows linked to an active investigation or legal request. Clear the flag with +a targeted `UPDATE` once the hold is released; the row becomes eligible for +the next purge cycle automatically. + +## 3. Scheduled purge + +`StellaOps.Timeline.WebService.Audit.AuditRetentionPurgeService` is an +`IHostedService` registered in `Program.cs`. Every cycle it: + +1. Enumerates `DISTINCT tenant_id` values in `timeline.unified_audit_events`. +2. Calls `timeline.purge_expired_audit_events(tenant_id, dry_run)` for each. +3. Logs a line per classification that actually deleted rows. + +Bind the following configuration section to tune behaviour: + +```yaml +AuditRetentionPurge: + Enabled: true # master toggle, default true + DryRun: false # when true, counts candidates without deleting + InitialDelay: 00:05:00 # wait 5 minutes after startup before first cycle + Interval: 06:00:00 # 6-hour gap between cycles +``` + +### Operating recommendations + +- In air-gap deployments leave defaults in place — the 6-hour cadence keeps + row growth bounded without pressuring I/O. +- When onboarding a new tenant with large historical imports, set + `DryRun: true` for one cycle to measure candidate counts before letting the + purge run hot. +- If a migration adds a new classification value in the future, seed a + platform-default row in `audit_retention_policies` before the next cycle; + otherwise the function falls back to 365 days. + +## 4. Right-to-erasure (GDPR Article 17) + +Endpoint: + +``` +DELETE /api/v1/audit/actors/{actorId}/pii +``` + +- Requires the `timeline:admin` scope (policy name `Timeline.Admin`). +- Resolves the tenant from the `x-tenant-id` header or `TenantId` HTTP item. +- Calls `timeline.redact_actor_pii(tenant_id, actor_id)` which replaces + `actor_email`, `actor_ip`, `actor_user_agent` with `[REDACTED]` for every + matching row. For rows classified as `personal` or `sensitive`, `actor_name` + is also redacted. `actor_id` is never touched because it feeds into the + `content_hash` that anchors each row in the tenant's hash chain. +- Response body: + ```json + { + "tenantId": "acme-prod", + "actorId": "user-123", + "redactedCount": 42, + "redactedAt": "2026-04-19T15:30:00+00:00" + } + ``` +- The request is idempotent: rows that already have `pii_redacted_at` set + are skipped, so replaying the request does not return stale counts. + +### Chain integrity after redaction + +The content hash is computed from the canonical JSON of the event at +ingest time and is not recomputed during redaction. `verify_unified_audit_chain` +continues to pass because: + +- `previous_entry_hash` links are unmodified. +- `content_hash` values are unmodified. +- The redacted PII fields are not part of the hash input. + +Auditors can still verify that the chain has not been tampered with, even +though some rows now contain `[REDACTED]` PII. + +## 5. Sequence-chain gaps after purge + +`timeline.purge_expired_audit_events` deletes rows; the remaining rows +keep their original `sequence_number` values, so a chain verification run +on the surviving sequence ranges will show gaps. This is intentional: + +- `verify_unified_audit_chain(tenant_id, start_seq, end_seq)` verifies + contiguous ranges. For retention-aware verification, query the chain in + the window `[oldest_surviving_seq, MAX(seq)]` (or pick a sub-range). +- The Timeline UI's chain-verify tile should filter chain verification to + the window above the oldest purge cutoff so users never see a spurious + "chain break at sequence N" warning caused by routine purge. + +## 6. Compliance checklist for operators + +- [ ] Verify `AuditRetentionPurge:Enabled` is `true` in production. +- [ ] Confirm tenant-specific retention overrides are in place for any + regulated tenant with tighter/looser needs than the platform defaults. +- [ ] Wire alerting on the `AuditRetentionPurge` log scope so purge failures + surface before a retention breach. +- [ ] Document the right-to-erasure runbook: who can invoke the endpoint, + what evidence to collect, and how to confirm completion. +- [ ] Review `audit_retention_policies` annually alongside the data-processing + record of the organisation.