docs(timeline): audit retention + erasure dossier
Sprint SPRINT_20260408_004 AUDIT-004 documentation criterion. docs/modules/timeline/audit-retention.md covers: - Four-rung classification ladder and the "narrowest wins" rule - Retention table structure, platform defaults, per-tenant overrides, and legal holds via compliance_hold - AuditRetentionPurgeService config + operator recommendations - Right-to-erasure endpoint contract, hash-chain integrity guarantees, and the idempotency semantics via pii_redacted_at - Sequence-chain gap behaviour after purge and how chain verification should window its checks - Compliance checklist for operators Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
162
docs/modules/timeline/audit-retention.md
Normal file
162
docs/modules/timeline/audit-retention.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Timeline Unified Audit — Data Classification, Retention, and Right-to-Erasure
|
||||
|
||||
> Sprint: `SPRINT_20260408_004_Timeline_unified_audit_sink` (AUDIT-004).
|
||||
> Scope: how Timeline classifies, retains, and redacts unified audit events.
|
||||
|
||||
## 1. Classifications
|
||||
|
||||
Every row in `timeline.unified_audit_events` carries a `data_classification`
|
||||
value drawn from the four-rung ladder below. The narrowest applicable class
|
||||
wins — the classifier (`StellaOps.Timeline.WebService.Audit.AuditDataClassifier`)
|
||||
evaluates from most to least restrictive.
|
||||
|
||||
| Classification | When it applies | Examples |
|
||||
|---|---|---|
|
||||
| `restricted` | Key-escrow / signing-ceremony / trust-anchor operations in `signer` or `attestor`; any action whose name contains `key_escrow`, `signing_key`, or `rotate_signing_key` regardless of module. | `signer.key_rotate`, `signer.ceremony_open`, `attestor.rekor_submit`, `attestor.trust_anchor_update`, `platform.key_escrow_grant`. |
|
||||
| `sensitive` | Authority auth-protocol events with subject context — logins, token grants, lockouts, MFA and password reset flows. | `authority.login`, `authority.token_grant`, `authority.lockout`, `authority.mfa_challenge`, `authority.password_reset`. |
|
||||
| `personal` | Actor PII present: `actor.email`, `actor.ip_address`, or `actor.user_agent`. | `notify.update` with `actor.email` set; `jobengine.execute` from a user with `actor.ip`. |
|
||||
| `none` | No actor PII and no sensitive/restricted signal. | Pure system events, service-to-service heartbeats. |
|
||||
|
||||
The ladder is deliberately strict: a `signer.key_rotate` event with both a
|
||||
user email and an IP still classifies as `restricted`, and an
|
||||
`authority.login` event with `actor.ip` set still classifies as `sensitive`
|
||||
(not `personal`).
|
||||
|
||||
Classification happens automatically at ingest if the incoming
|
||||
`AuditEventPayload.DataClassification` is null or whitespace. Producers that
|
||||
already know the class can set it explicitly to bypass classification.
|
||||
|
||||
## 2. Retention policy
|
||||
|
||||
Retention windows live in `timeline.audit_retention_policies`. The table is
|
||||
keyed on `(tenant_id, data_classification)`. A row with `tenant_id = '*'` is
|
||||
the platform default; tenant-specific rows override the default.
|
||||
|
||||
Platform defaults seeded by migration 005:
|
||||
|
||||
| Classification | Retention |
|
||||
|---|---|
|
||||
| `none` | 365 days |
|
||||
| `personal` | 365 days |
|
||||
| `sensitive` | 730 days |
|
||||
| `restricted` | 2555 days (~7 years) |
|
||||
|
||||
`timeline.resolve_audit_retention_days(tenant_id, classification)` resolves the
|
||||
effective window: tenant-specific → platform default → 365-day fallback.
|
||||
|
||||
### Overriding retention for a tenant
|
||||
|
||||
```sql
|
||||
INSERT INTO timeline.audit_retention_policies (tenant_id, data_classification, retention_days)
|
||||
VALUES ('acme-prod', 'personal', 180)
|
||||
ON CONFLICT (tenant_id, data_classification)
|
||||
DO UPDATE SET retention_days = EXCLUDED.retention_days, updated_at = NOW();
|
||||
```
|
||||
|
||||
Tenant overrides apply immediately to the next purge cycle. Shortening a
|
||||
retention window will delete rows whose `timestamp` already falls outside the
|
||||
new window on the first cycle after the change.
|
||||
|
||||
### Legal holds
|
||||
|
||||
Set `compliance_hold = TRUE` on any row that must survive retention-driven
|
||||
deletion. The purge function filters `compliance_hold = FALSE`, so held rows
|
||||
never get purged even if their retention window has expired. Use this for
|
||||
rows linked to an active investigation or legal request. Clear the flag with
|
||||
a targeted `UPDATE` once the hold is released; the row becomes eligible for
|
||||
the next purge cycle automatically.
|
||||
|
||||
## 3. Scheduled purge
|
||||
|
||||
`StellaOps.Timeline.WebService.Audit.AuditRetentionPurgeService` is an
|
||||
`IHostedService` registered in `Program.cs`. Every cycle it:
|
||||
|
||||
1. Enumerates `DISTINCT tenant_id` values in `timeline.unified_audit_events`.
|
||||
2. Calls `timeline.purge_expired_audit_events(tenant_id, dry_run)` for each.
|
||||
3. Logs a line per classification that actually deleted rows.
|
||||
|
||||
Bind the following configuration section to tune behaviour:
|
||||
|
||||
```yaml
|
||||
AuditRetentionPurge:
|
||||
Enabled: true # master toggle, default true
|
||||
DryRun: false # when true, counts candidates without deleting
|
||||
InitialDelay: 00:05:00 # wait 5 minutes after startup before first cycle
|
||||
Interval: 06:00:00 # 6-hour gap between cycles
|
||||
```
|
||||
|
||||
### Operating recommendations
|
||||
|
||||
- In air-gap deployments leave defaults in place — the 6-hour cadence keeps
|
||||
row growth bounded without pressuring I/O.
|
||||
- When onboarding a new tenant with large historical imports, set
|
||||
`DryRun: true` for one cycle to measure candidate counts before letting the
|
||||
purge run hot.
|
||||
- If a migration adds a new classification value in the future, seed a
|
||||
platform-default row in `audit_retention_policies` before the next cycle;
|
||||
otherwise the function falls back to 365 days.
|
||||
|
||||
## 4. Right-to-erasure (GDPR Article 17)
|
||||
|
||||
Endpoint:
|
||||
|
||||
```
|
||||
DELETE /api/v1/audit/actors/{actorId}/pii
|
||||
```
|
||||
|
||||
- Requires the `timeline:admin` scope (policy name `Timeline.Admin`).
|
||||
- Resolves the tenant from the `x-tenant-id` header or `TenantId` HTTP item.
|
||||
- Calls `timeline.redact_actor_pii(tenant_id, actor_id)` which replaces
|
||||
`actor_email`, `actor_ip`, `actor_user_agent` with `[REDACTED]` for every
|
||||
matching row. For rows classified as `personal` or `sensitive`, `actor_name`
|
||||
is also redacted. `actor_id` is never touched because it feeds into the
|
||||
`content_hash` that anchors each row in the tenant's hash chain.
|
||||
- Response body:
|
||||
```json
|
||||
{
|
||||
"tenantId": "acme-prod",
|
||||
"actorId": "user-123",
|
||||
"redactedCount": 42,
|
||||
"redactedAt": "2026-04-19T15:30:00+00:00"
|
||||
}
|
||||
```
|
||||
- The request is idempotent: rows that already have `pii_redacted_at` set
|
||||
are skipped, so replaying the request does not return stale counts.
|
||||
|
||||
### Chain integrity after redaction
|
||||
|
||||
The content hash is computed from the canonical JSON of the event at
|
||||
ingest time and is not recomputed during redaction. `verify_unified_audit_chain`
|
||||
continues to pass because:
|
||||
|
||||
- `previous_entry_hash` links are unmodified.
|
||||
- `content_hash` values are unmodified.
|
||||
- The redacted PII fields are not part of the hash input.
|
||||
|
||||
Auditors can still verify that the chain has not been tampered with, even
|
||||
though some rows now contain `[REDACTED]` PII.
|
||||
|
||||
## 5. Sequence-chain gaps after purge
|
||||
|
||||
`timeline.purge_expired_audit_events` deletes rows; the remaining rows
|
||||
keep their original `sequence_number` values, so a chain verification run
|
||||
on the surviving sequence ranges will show gaps. This is intentional:
|
||||
|
||||
- `verify_unified_audit_chain(tenant_id, start_seq, end_seq)` verifies
|
||||
contiguous ranges. For retention-aware verification, query the chain in
|
||||
the window `[oldest_surviving_seq, MAX(seq)]` (or pick a sub-range).
|
||||
- The Timeline UI's chain-verify tile should filter chain verification to
|
||||
the window above the oldest purge cutoff so users never see a spurious
|
||||
"chain break at sequence N" warning caused by routine purge.
|
||||
|
||||
## 6. Compliance checklist for operators
|
||||
|
||||
- [ ] Verify `AuditRetentionPurge:Enabled` is `true` in production.
|
||||
- [ ] Confirm tenant-specific retention overrides are in place for any
|
||||
regulated tenant with tighter/looser needs than the platform defaults.
|
||||
- [ ] Wire alerting on the `AuditRetentionPurge` log scope so purge failures
|
||||
surface before a retention breach.
|
||||
- [ ] Document the right-to-erasure runbook: who can invoke the endpoint,
|
||||
what evidence to collect, and how to confirm completion.
|
||||
- [ ] Review `audit_retention_policies` annually alongside the data-processing
|
||||
record of the organisation.
|
||||
Reference in New Issue
Block a user