Files
git.stella-ops.org/docs/modules/timeline/audit-retention.md
master 44195cd7af docs(timeline): audit retention + erasure dossier
Sprint SPRINT_20260408_004 AUDIT-004 documentation criterion.

docs/modules/timeline/audit-retention.md covers:
- Four-rung classification ladder and the "narrowest wins" rule
- Retention table structure, platform defaults, per-tenant overrides,
  and legal holds via compliance_hold
- AuditRetentionPurgeService config + operator recommendations
- Right-to-erasure endpoint contract, hash-chain integrity guarantees,
  and the idempotency semantics via pii_redacted_at
- Sequence-chain gap behaviour after purge and how chain verification
  should window its checks
- Compliance checklist for operators

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:39:28 +03:00

7.3 KiB

Timeline Unified Audit — Data Classification, Retention, and Right-to-Erasure

Sprint: SPRINT_20260408_004_Timeline_unified_audit_sink (AUDIT-004). Scope: how Timeline classifies, retains, and redacts unified audit events.

1. Classifications

Every row in timeline.unified_audit_events carries a data_classification value drawn from the four-rung ladder below. The narrowest applicable class wins — the classifier (StellaOps.Timeline.WebService.Audit.AuditDataClassifier) evaluates from most to least restrictive.

Classification When it applies Examples
restricted Key-escrow / signing-ceremony / trust-anchor operations in signer or attestor; any action whose name contains key_escrow, signing_key, or rotate_signing_key regardless of module. signer.key_rotate, signer.ceremony_open, attestor.rekor_submit, attestor.trust_anchor_update, platform.key_escrow_grant.
sensitive Authority auth-protocol events with subject context — logins, token grants, lockouts, MFA and password reset flows. authority.login, authority.token_grant, authority.lockout, authority.mfa_challenge, authority.password_reset.
personal Actor PII present: actor.email, actor.ip_address, or actor.user_agent. notify.update with actor.email set; jobengine.execute from a user with actor.ip.
none No actor PII and no sensitive/restricted signal. Pure system events, service-to-service heartbeats.

The ladder is deliberately strict: a signer.key_rotate event with both a user email and an IP still classifies as restricted, and an authority.login event with actor.ip set still classifies as sensitive (not personal).

Classification happens automatically at ingest if the incoming AuditEventPayload.DataClassification is null or whitespace. Producers that already know the class can set it explicitly to bypass classification.

2. Retention policy

Retention windows live in timeline.audit_retention_policies. The table is keyed on (tenant_id, data_classification). A row with tenant_id = '*' is the platform default; tenant-specific rows override the default.

Platform defaults seeded by migration 005:

Classification Retention
none 365 days
personal 365 days
sensitive 730 days
restricted 2555 days (~7 years)

timeline.resolve_audit_retention_days(tenant_id, classification) resolves the effective window: tenant-specific → platform default → 365-day fallback.

Overriding retention for a tenant

INSERT INTO timeline.audit_retention_policies (tenant_id, data_classification, retention_days)
VALUES ('acme-prod', 'personal', 180)
ON CONFLICT (tenant_id, data_classification)
DO UPDATE SET retention_days = EXCLUDED.retention_days, updated_at = NOW();

Tenant overrides apply immediately to the next purge cycle. Shortening a retention window will delete rows whose timestamp already falls outside the new window on the first cycle after the change.

Set compliance_hold = TRUE on any row that must survive retention-driven deletion. The purge function filters compliance_hold = FALSE, so held rows never get purged even if their retention window has expired. Use this for rows linked to an active investigation or legal request. Clear the flag with a targeted UPDATE once the hold is released; the row becomes eligible for the next purge cycle automatically.

3. Scheduled purge

StellaOps.Timeline.WebService.Audit.AuditRetentionPurgeService is an IHostedService registered in Program.cs. Every cycle it:

  1. Enumerates DISTINCT tenant_id values in timeline.unified_audit_events.
  2. Calls timeline.purge_expired_audit_events(tenant_id, dry_run) for each.
  3. Logs a line per classification that actually deleted rows.

Bind the following configuration section to tune behaviour:

AuditRetentionPurge:
  Enabled: true       # master toggle, default true
  DryRun: false       # when true, counts candidates without deleting
  InitialDelay: 00:05:00   # wait 5 minutes after startup before first cycle
  Interval: 06:00:00       # 6-hour gap between cycles

Operating recommendations

  • In air-gap deployments leave defaults in place — the 6-hour cadence keeps row growth bounded without pressuring I/O.
  • When onboarding a new tenant with large historical imports, set DryRun: true for one cycle to measure candidate counts before letting the purge run hot.
  • If a migration adds a new classification value in the future, seed a platform-default row in audit_retention_policies before the next cycle; otherwise the function falls back to 365 days.

4. Right-to-erasure (GDPR Article 17)

Endpoint:

DELETE /api/v1/audit/actors/{actorId}/pii
  • Requires the timeline:admin scope (policy name Timeline.Admin).
  • Resolves the tenant from the x-tenant-id header or TenantId HTTP item.
  • Calls timeline.redact_actor_pii(tenant_id, actor_id) which replaces actor_email, actor_ip, actor_user_agent with [REDACTED] for every matching row. For rows classified as personal or sensitive, actor_name is also redacted. actor_id is never touched because it feeds into the content_hash that anchors each row in the tenant's hash chain.
  • Response body:
    {
      "tenantId": "acme-prod",
      "actorId": "user-123",
      "redactedCount": 42,
      "redactedAt": "2026-04-19T15:30:00+00:00"
    }
    
  • The request is idempotent: rows that already have pii_redacted_at set are skipped, so replaying the request does not return stale counts.

Chain integrity after redaction

The content hash is computed from the canonical JSON of the event at ingest time and is not recomputed during redaction. verify_unified_audit_chain continues to pass because:

  • previous_entry_hash links are unmodified.
  • content_hash values are unmodified.
  • The redacted PII fields are not part of the hash input.

Auditors can still verify that the chain has not been tampered with, even though some rows now contain [REDACTED] PII.

5. Sequence-chain gaps after purge

timeline.purge_expired_audit_events deletes rows; the remaining rows keep their original sequence_number values, so a chain verification run on the surviving sequence ranges will show gaps. This is intentional:

  • verify_unified_audit_chain(tenant_id, start_seq, end_seq) verifies contiguous ranges. For retention-aware verification, query the chain in the window [oldest_surviving_seq, MAX(seq)] (or pick a sub-range).
  • The Timeline UI's chain-verify tile should filter chain verification to the window above the oldest purge cutoff so users never see a spurious "chain break at sequence N" warning caused by routine purge.

6. Compliance checklist for operators

  • Verify AuditRetentionPurge:Enabled is true in production.
  • Confirm tenant-specific retention overrides are in place for any regulated tenant with tighter/looser needs than the platform defaults.
  • Wire alerting on the AuditRetentionPurge log scope so purge failures surface before a retention breach.
  • Document the right-to-erasure runbook: who can invoke the endpoint, what evidence to collect, and how to confirm completion.
  • Review audit_retention_policies annually alongside the data-processing record of the organisation.