Files

StellaOps Bot e6c47c8f50 save progress

2025-12-28 23:49:56 +02:00

8.3 KiB

Raw Blame History

component_architecture_replay.md - Stella Ops Replay (2025Q4)

Deterministic replay engine for vulnerability verdict reproducibility.

Scope. Implementation-ready architecture for Replay: the deterministic replay engine ensuring vulnerability assessments can be reproduced byte-for-byte given the same inputs. Covers replay tokens, manifests, feed snapshots, and verification workflows.

0) Mission & boundaries

Mission. Enable deterministic reproducibility of vulnerability verdicts. Given identical inputs (SBOM, policy, feeds, toolchain), the system MUST produce identical outputs. Replay provides the infrastructure to capture, store, and verify these deterministic execution chains.

Boundaries.

Replay does not make vulnerability decisions. It captures the inputs and outputs of decision-making services.
Replay does not store SBOMs or vulnerability data. It stores references (digests) to content-addressed artifacts.
Replay tokens are cryptographically bound to input digests.
All timestamps are UTC ISO-8601 with microsecond precision.

1) Solution & project layout

src/Replay/
 ├─ StellaOps.Replay.WebService/         # Token issuance and verification API
 │   ├─ Program.cs                       # ASP.NET Core host
 │   └─ VerdictReplayEndpoints.cs        # Minimal API endpoints
 └─ __Tests/
     └─ StellaOps.Replay.Core.Tests/     # Unit tests

src/__Libraries/
 ├─ StellaOps.Replay.Core/               # Core replay manifest and validation
 │   ├─ ReplayManifest.cs                # Manifest schema (v1, v2)
 │   ├─ ReplayManifestValidator.cs       # Validation logic
 │   ├─ DeterministicHash.cs             # Hash computation utilities
 │   ├─ PolicySimulationInputLock.cs     # Input pinning for simulation
 │   └─ FeedSnapshot/
 │       └─ FeedSnapshotCoordinatorService.cs
 │
 ├─ StellaOps.Audit.ReplayToken/         # Token generation and verification
 │   ├─ ReplayToken.cs                   # Token model
 │   ├─ ReplayTokenRequest.cs            # Token request DTO
 │   ├─ IReplayTokenGenerator.cs         # Generator interface
 │   └─ Sha256ReplayTokenGenerator.cs    # SHA-256 based implementation
 │
 └─ StellaOps.Replay/                    # Shared replay utilities

2) External dependencies

PostgreSQL - Token storage and manifest persistence
Authority - Authentication for token issuance/verification
Cryptography - Hash computation (SHA-256, BLAKE3)
CAS (Content-Addressed Storage) - Artifact storage for replay bundles
Policy Engine - Consumes replay manifests for deterministic simulation

3) Contracts & data model

3.1 ReplayManifest

The manifest captures all inputs required to reproduce a verdict:

{
  "schemaVersion": "2.0",
  "scan": {
    "id": "scan-2025-01-15T10:30:00Z-abc123",
    "time": "2025-01-15T10:30:00.000000Z",
    "policyDigest": "sha256:abc123...",
    "scorePolicyDigest": "sha256:def456...",
    "feedSnapshot": "sha256:789abc...",
    "toolchain": "stellaops/scanner:1.7.3",
    "analyzerSetDigest": "sha256:feed12..."
  },
  "reachability": {
    "analysisId": "reach-xyz",
    "graphs": [
      {
        "kind": "static",
        "casUri": "cas://reachability/graphs/abc123",
        "hash": "blake3:a1b2c3d4...",
        "hashAlg": "blake3-256",
        "analyzer": "elf-callgraph",
        "version": "1.2.0"
      }
    ],
    "runtimeTraces": [],
    "code_id_coverage": {
      "total_nodes": 1500,
      "nodes_with_symbol_id": 1200,
      "nodes_with_code_id": 1100,
      "coverage_percent": 73.33
    }
  },
  "proofSpines": [
    {
      "spineId": "spine-abc123",
      "artifactId": "pkg:npm/lodash@4.17.21",
      "vulnerabilityId": "CVE-2021-23337",
      "verdict": "not_affected",
      "segmentCount": 4,
      "rootHash": "sha256:fedcba...",
      "casUri": "cas://proofs/spines/abc123",
      "createdAt": "2025-01-15T10:30:05.000000Z"
    }
  ]
}

3.2 ReplayToken

Cryptographic token binding inputs to outputs:

public sealed record ReplayToken
{
    public required string TokenId { get; init; }           // Unique token ID
    public required string InputDigest { get; init; }       // Hash of all inputs
    public required string OutputDigest { get; init; }      // Hash of verdict output
    public required DateTimeOffset IssuedAt { get; init; }  // UTC timestamp
    public required string Issuer { get; init; }            // Service that issued
    public string? Signature { get; init; }                 // DSSE signature
}

3.3 PolicySimulationInputLock

Captures pinned versions for deterministic policy simulation:

public sealed record PolicySimulationInputLock
{
    public required string PolicyDigest { get; init; }
    public required string FeedSnapshotDigest { get; init; }
    public required string ScorePolicyDigest { get; init; }
    public required DateTimeOffset LockedAt { get; init; }
    public required IReadOnlyList<AnalyzerPin> AnalyzerPins { get; init; }
}

4) REST API (Replay.WebService)

All under /api/v1/replay. Auth: OpTok (DPoP/mTLS).

POST /tokens                    { request: ReplayTokenRequest } → { token: ReplayToken }
GET  /tokens/{tokenId}          → { token: ReplayToken, status }
POST /tokens/{tokenId}/verify   { manifest: ReplayManifest } → { valid: bool, details }

GET  /manifests/{scanId}        → { manifest: ReplayManifest }
POST /manifests                 { manifest: ReplayManifest } → { manifestId }

GET  /healthz | /readyz

Authorization Policies:

replay.token.read - Read tokens and manifests
replay.token.write - Issue new tokens

5) Configuration (YAML)

Replay:
  Authority:
    Issuer: "https://authority.stellaops.local"
    RequireHttpsMetadata: true
    MetadataAddress: "https://authority.stellaops.local/.well-known/openid-configuration"
    Audiences: ["replay-service"]
    RequiredScopes: ["vuln:operate"]

  Storage:
    ConnectionString: "Host=postgres;Database=replay;..."
    CasEndpoint: "http://rustfs:8080"

  Tokens:
    Algorithm: "SHA256"
    ExpirationHours: 8760  # 1 year

  Determinism:
    EnforceCanonicalJson: true
    HashAlgorithm: "blake3-256"

6) Determinism guarantees

6.1 Input pinning

All inputs that affect verdict output are captured:

Input	Pinning Method	Storage
Policy YAML	Content digest	CAS
Score policy	Content digest	CAS
Feed snapshot	Snapshot digest + timestamp	CAS
Toolchain	Image digest	Manifest
Analyzers	Version + digest	Manifest
Reachability graphs	BLAKE3 hash	CAS

6.2 Output determinism

Guarantee	Implementation
Canonical JSON	Sorted keys, no whitespace variation
Stable ordering	Deterministic sort on all collections
UTC timestamps	Microsecond precision, always UTC
Hash stability	Same input → same hash

7) Security & compliance

Token binding: Tokens are cryptographically bound to input digests
Non-repudiation: DSSE signatures on tokens (optional)
Audit trail: All token operations logged with tenant context
Immutability: Manifests and tokens are append-only

8) Performance targets

Token issuance: < 50ms P95
Token verification: < 100ms P95
Manifest storage: < 200ms P95
Manifest retrieval: < 50ms P95

9) Observability

Metrics:

replay.tokens.issued_total{issuer}
replay.tokens.verified_total{result=valid|invalid}
replay.manifests.stored_total
replay.verification.duration_seconds

Tracing: Spans for token operations, manifest storage, verification workflows.

10) Testing matrix

Determinism tests: Same inputs produce identical tokens/manifests
Round-trip tests: Store → retrieve → verify produces same result
Hash stability: Canonical JSON hashing is stable across serialization
Integration tests: Full token lifecycle with Policy Engine

Scanner determinism: ../scanner/deterministic-execution.md
Policy simulation: ../policy/architecture.md
Evidence attestation: ../attestor/architecture.md
Replay protocol: ../../replay/DETERMINISTIC_REPLAY.md

8.3 KiB Raw Blame History