Files
git.stella-ops.org/docs/airgap/advisory-implementation-roadmap.md
StellaOps Bot b058dbe031 up
2025-12-14 23:20:14 +02:00

13 KiB

Offline and Air-Gap Advisory Implementation Roadmap

Source Advisory: 14-Dec-2025 - Offline and Air-Gap Technical Reference Document Version: 1.0 Last Updated: 2025-12-14


Executive Summary

This document outlines the implementation roadmap for gaps identified between the 14-Dec-2025 Offline and Air-Gap Technical Reference advisory and the current StellaOps codebase. The implementation is organized into 5 sprints addressing security-critical, high-priority, and enhancement-level improvements.


Implementation Overview

Sprint Summary

Sprint Topic Priority Gaps Effort Dependencies
0338 AirGap Importer Core P0 G6, G7 Medium None
0339 CLI Offline Commands P1 G4 Medium 0338
0340 Scanner Offline Config P2 G5 Medium 0338
0341 Observability & Audit P1-P2 G11-G14 Medium 0338
0342 Evidence Reconciliation P3 G10 High 0338, 0340

Dependency Graph

            ┌─────────────────────────────────────────────┐
            │                                             │
            │   Sprint 0338: AirGap Importer Core (P0)   │
            │   - Monotonicity enforcement (G6)           │
            │   - Quarantine handling (G7)                │
            │                                             │
            └──────────────────┬──────────────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
         ▼                     ▼                     ▼
┌────────────────┐   ┌────────────────┐   ┌────────────────┐
│ Sprint 0339    │   │ Sprint 0340    │   │ Sprint 0341    │
│ CLI Commands   │   │ Scanner Config │   │ Observability  │
│ (P1)           │   │ (P2)           │   │ (P1-P2)        │
│ - G4           │   │ - G5           │   │ - G11-G14      │
└────────────────┘   └───────┬────────┘   └────────────────┘
                             │
                             ▼
                    ┌────────────────┐
                    │ Sprint 0342    │
                    │ Evidence Recon │
                    │ (P3)           │
                    │ - G10          │
                    └────────────────┘

Gap-to-Sprint Mapping

P0 - Critical (Must Implement First)

Gap ID Description Sprint Rationale
G6 Monotonicity enforcement 0338 Rollback prevention is security-critical; prevents replay attacks
G7 Quarantine directory handling 0338 Essential for forensic analysis of failed imports

P1 - High Priority

Gap ID Description Sprint Rationale
G4 CLI offline command group 0339 Primary operator interface; competitive parity
G11 Prometheus metrics 0341 Operational visibility in air-gap environments
G13 Error reason codes 0341 Automation and troubleshooting

P2 - Important

Gap ID Description Sprint Rationale
G5 Scanner offline config surface 0340 Enterprise trust anchor management
G12 Structured logging fields 0341 Log aggregation and correlation
G14 Audit schema enhancement 0341 Compliance and chain-of-custody

P3 - Lower Priority

Gap ID Description Sprint Rationale
G10 Evidence reconciliation algorithm 0342 Complex but valuable; VEX-first decisioning

Deferred (Not Implementing)

Gap ID Description Rationale
G9 YAML verification policy schema Over-engineering; existing JSON/code config sufficient

Technical Architecture

New Components

src/AirGap/
├── StellaOps.AirGap.Importer/
│   ├── Versioning/
│   │   ├── BundleVersion.cs                 # Sprint 0338
│   │   ├── IVersionMonotonicityChecker.cs   # Sprint 0338
│   │   └── IBundleVersionStore.cs           # Sprint 0338
│   ├── Quarantine/
│   │   ├── IQuarantineService.cs            # Sprint 0338
│   │   ├── FileSystemQuarantineService.cs   # Sprint 0338
│   │   └── QuarantineOptions.cs             # Sprint 0338
│   ├── Telemetry/
│   │   ├── OfflineKitMetrics.cs             # Sprint 0341
│   │   └── OfflineKitLogFields.cs           # Sprint 0341
│   ├── Audit/
│   │   └── OfflineKitAuditEmitter.cs        # Sprint 0341
│   ├── Reconciliation/
│   │   ├── ArtifactIndex.cs                 # Sprint 0342
│   │   ├── EvidenceCollector.cs             # Sprint 0342
│   │   ├── DocumentNormalizer.cs            # Sprint 0342
│   │   ├── PrecedenceLattice.cs             # Sprint 0342
│   │   └── EvidenceGraphEmitter.cs          # Sprint 0342
│   └── OfflineKitReasonCodes.cs             # Sprint 0341

src/Scanner/
├── __Libraries/StellaOps.Scanner.Core/
│   ├── Configuration/
│   │   ├── OfflineKitOptions.cs             # Sprint 0340
│   │   ├── TrustAnchorConfig.cs             # Sprint 0340
│   │   └── OfflineKitOptionsValidator.cs    # Sprint 0340
│   └── TrustAnchors/
│       ├── PurlPatternMatcher.cs            # Sprint 0340
│       ├── ITrustAnchorRegistry.cs          # Sprint 0340
│       └── TrustAnchorRegistry.cs           # Sprint 0340

src/Cli/
├── StellaOps.Cli/
│   └── Commands/
│       ├── Offline/
│       │   ├── OfflineCommandGroup.cs       # Sprint 0339
│       │   ├── OfflineImportHandler.cs      # Sprint 0339
│       │   ├── OfflineStatusHandler.cs      # Sprint 0339
│       │   └── OfflineExitCodes.cs          # Sprint 0339
│       └── Verify/
│           └── VerifyOfflineHandler.cs      # Sprint 0339

src/Authority/
├── __Libraries/StellaOps.Authority.Storage.Postgres/
│   └── Migrations/
│       └── 003_offline_kit_audit.sql        # Sprint 0341

Database Changes

Table Schema Sprint Purpose
airgap.bundle_versions New 0338 Track active bundle versions per tenant/type
airgap.bundle_version_history New 0338 Version history for audit trail
authority.offline_kit_audit New 0341 Enhanced audit with Rekor/DSSE fields

Configuration Changes

Section Sprint Fields
AirGap:Quarantine 0338 QuarantineRoot, RetentionPeriod, MaxQuarantineSizeBytes
Scanner:OfflineKit 0340 RequireDsse, RekorOfflineMode, TrustAnchors[]

CLI Commands

Command Sprint Description
stellaops offline import 0339 Import offline kit with verification
stellaops offline status 0339 Display current kit status
stellaops verify offline 0339 Offline evidence verification

Metrics

Metric Type Sprint Labels
offlinekit_import_total Counter 0341 status, tenant_id
offlinekit_attestation_verify_latency_seconds Histogram 0341 attestation_type, success
attestor_rekor_success_total Counter 0341 mode
attestor_rekor_retry_total Counter 0341 reason
rekor_inclusion_latency Histogram 0341 success

Implementation Sequence

Phase 1: Foundation (Sprint 0338)

Duration: 1 sprint Focus: Security-critical infrastructure

  1. Implement BundleVersion model with semver parsing
  2. Create IVersionMonotonicityChecker and Postgres store
  3. Integrate monotonicity check into ImportValidator
  4. Implement --force-activate with audit trail
  5. Create IQuarantineService and file-system implementation
  6. Integrate quarantine into all import failure paths
  7. Write comprehensive tests

Exit Criteria:

  • Rollback attacks are prevented
  • Failed bundles are preserved for investigation
  • Force activation requires justification

Phase 2: Operator Experience (Sprints 0339, 0341)

Duration: 1-2 sprints (can parallelize) Focus: CLI and observability

Sprint 0339 (CLI):

  1. Create offline command group
  2. Implement offline import with all flags
  3. Implement offline status with output formats
  4. Implement verify offline with policy loading
  5. Add exit code standardization
  6. Write CLI integration tests

Sprint 0341 (Observability):

  1. Add Prometheus metrics infrastructure
  2. Implement offline kit metrics
  3. Standardize structured logging fields
  4. Complete error reason codes
  5. Create audit schema migration
  6. Implement audit repository and emitter
  7. Create Grafana dashboard

Exit Criteria:

  • Operators can import/verify kits via CLI
  • Metrics are visible in Prometheus/Grafana
  • All operations are auditable

Phase 3: Configuration (Sprint 0340)

Duration: 1 sprint Focus: Trust anchor management

  1. Create OfflineKitOptions configuration class
  2. Implement PURL pattern matcher
  3. Create TrustAnchorRegistry with precedence resolution
  4. Add options validation
  5. Integrate trust anchors with DSSE verification
  6. Update Helm chart values
  7. Write configuration tests

Exit Criteria:

  • Trust anchors configurable per ecosystem
  • DSSE verification uses configured anchors
  • Invalid configuration fails startup

Phase 4: Advanced Features (Sprint 0342)

Duration: 1-2 sprints Focus: Evidence reconciliation

  1. Design artifact indexing
  2. Implement evidence collection
  3. Create document normalization
  4. Implement VEX precedence lattice
  5. Create evidence graph emitter
  6. Integrate with CLI verify offline
  7. Write golden-file determinism tests

Exit Criteria:

  • Evidence reconciliation is deterministic
  • VEX conflicts resolved by precedence
  • Graph output is signed and verifiable

Testing Strategy

Unit Tests

  • All new classes have corresponding test classes
  • Mock dependencies for isolation
  • Property-based tests for lattice operations

Integration Tests

  • Testcontainers for PostgreSQL
  • Full import → verification → audit flow
  • CLI command execution tests

Determinism Tests

  • Golden-file tests for evidence reconciliation
  • Cross-platform validation (Windows, Linux, macOS)
  • Reproducibility across runs

Security Tests

  • Monotonicity bypass attempts
  • Signature verification edge cases
  • Trust anchor configuration validation

Documentation Updates

Document Sprint Updates
docs/airgap/importer-scaffold.md 0338 Add monotonicity, quarantine sections
docs/airgap/runbooks/quarantine-investigation.md 0338 New runbook
docs/modules/cli/commands/offline.md 0339 New command reference
docs/modules/cli/guides/airgap.md 0339 Update with CLI examples
docs/modules/scanner/configuration.md 0340 Add offline kit config section
docs/airgap/observability.md 0341 Metrics and logging reference
docs/airgap/evidence-reconciliation.md 0342 Algorithm documentation

Risk Register

Risk Impact Mitigation
Monotonicity breaks existing workflows High Provide --force-activate escape hatch
Quarantine disk exhaustion Medium Implement quota and TTL cleanup
Trust anchor config complexity Medium Provide sensible defaults, validate at startup
Evidence reconciliation performance Medium Streaming processing, caching
Cross-platform determinism failures High CI matrix, golden-file tests

Success Metrics

Metric Target Sprint
Rollback attack prevention 100% 0338
Failed bundle quarantine rate 100% 0338
CLI command adoption 50% operators 0339
Metric collection uptime 99.9% 0341
Audit completeness 100% events 0341
Reconciliation determinism 100% 0342

References