Files
git.stella-ops.org/docs/airgap/runbooks/quarantine-investigation.md
master 4391f35d8a Refactor SurfaceCacheValidator to simplify oldest entry calculation
Add global using for Xunit in test project

Enhance ImportValidatorTests with async validation and quarantine checks

Implement FileSystemQuarantineServiceTests for quarantine functionality

Add integration tests for ImportValidator to check monotonicity

Create BundleVersionTests to validate version parsing and comparison logic

Implement VersionMonotonicityCheckerTests for monotonicity checks and activation logic
2025-12-16 10:44:00 +02:00

2.1 KiB
Raw Blame History

AirGap Quarantine Investigation Runbook

Purpose

Quarantine preserves failed bundle imports for offline forensic analysis. It keeps the original bundle and the verification context (reason + logs) so operators can diagnose tampering, trust-root drift, or packaging issues without re-running in an online environment.

Location & Structure

Default root: /updates/quarantine

Per-tenant layout: /updates/quarantine/<tenantId>/<timestamp>-<reason>-<id>/

Removal staging: /updates/quarantine/<tenantId>/.removed/<quarantineId>/

Files in a quarantine entry

  • bundle.tar.zst - the original bundle as provided
  • manifest.json - bundle manifest (when available)
  • verification.log - validation step output (TUF/DSSE/Merkle/rotation/monotonicity, etc.)
  • failure-reason.txt - human-readable failure summary (reason + timestamp + metadata)
  • quarantine.json - structured metadata for listing/automation

Investigation steps (offline)

  1. Identify the tenant and locate the quarantine root on the importer host.
  2. Pick the newest quarantine entry for the tenant (timestamp prefix).
  3. Read failure-reason.txt first to capture the top-level reason and metadata.
  4. Review verification.log for the precise failing step.
  5. If needed, extract and inspect bundle.tar.zst in an isolated workspace (no network).
  6. Decide whether the entry should be retained (for audit) or removed after investigation.

Removal & Retention

  • Removal requires a human-provided reason (audit trail). Implementations should use the quarantine services remove operation which moves entries under .removed/.
  • Retention and quota controls are configured via AirGap:Quarantine settings (root, TTL, max size); TTL cleanup can remove entries older than the retention period.

Common failure categories

  • tuf:* - invalid/expired metadata or snapshot hash mismatch
  • dsse:* - signature invalid or trust root mismatch
  • merkle-* - payload entry set invalid or empty
  • rotation:* - root rotation policy failure (dual approval, no-op rotation, etc.)
  • version-non-monotonic:* - rollback prevention triggered (force activation requires a justification)