Files
git.stella-ops.org/docs/modules/attestor/tile-proxy-design.md

7.9 KiB

Tile-Proxy Service Design

Overview

The Tile-Proxy service acts as an intermediary between StellaOps clients and upstream Rekor transparency log APIs. It provides centralized tile caching, request coalescing, and offline support for air-gapped environments.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  CI/CD Agents   │────►│   Tile Proxy    │────►│   Rekor API     │
│  (StellaOps)    │     │   (StellaOps)   │     │   (Upstream)    │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Tile Cache     │     │  TUF Metadata   │     │  Checkpoint     │
│  (CAS Store)    │     │  (TrustRepo)    │     │  Cache          │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Core Responsibilities

  1. Tile Proxying: Forward tile requests to upstream Rekor, caching responses locally
  2. Content-Addressed Storage: Store tiles by hash for deduplication and immutability
  3. TUF Integration: Optionally validate metadata using TUF trust anchors
  4. Request Coalescing: Deduplicate concurrent requests for the same tile
  5. Checkpoint Caching: Cache and serve recent checkpoints
  6. Offline Mode: Serve from cache when upstream is unavailable

API Surface

Proxy Endpoints (Passthrough)

Endpoint Description
GET /tile/{level}/{index} Proxy tile request (cache-through)
GET /tile/{level}/{index}.p/{partialWidth} Proxy partial tile
GET /checkpoint Proxy checkpoint request
GET /api/v1/log/entries/{uuid} Proxy entry lookup

Admin Endpoints

Endpoint Description
GET /_admin/cache/stats Cache statistics (hits, misses, size)
POST /_admin/cache/sync Trigger manual sync job
DELETE /_admin/cache/prune Prune old tiles
GET /_admin/health Health check
GET /_admin/ready Readiness check

Caching Strategy

Content-Addressed Tile Storage

Tiles are stored using content-addressed paths based on SHA-256 hash:

{cache_root}/
├── tiles/
│   ├── {origin_hash}/
│   │   ├── {level}/
│   │   │   ├── {index}.tile
│   │   │   └── {index}.meta.json
│   │   └── checkpoints/
│   │       └── {tree_size}.checkpoint
│   └── ...
└── metadata/
    └── cache_stats.json

Tile Metadata

Each tile has associated metadata:

{
  "cachedAt": "2026-01-25T10:00:00Z",
  "treeSize": 1050000,
  "isPartial": false,
  "contentHash": "sha256:abc123...",
  "upstreamUrl": "https://rekor.sigstore.dev"
}

Eviction Policy

  1. LRU by Access Time: Least recently accessed tiles evicted first
  2. Max Size Limit: Configurable maximum cache size
  3. TTL Override: Force re-fetch after configurable time (for checkpoints)
  4. Immutability Preservation: Full tiles (width=256) never evicted unless explicitly pruned

Request Coalescing

Concurrent requests for the same tile are coalesced:

// Pseudo-code for request coalescing
var key = $"{origin}/{level}/{index}";
if (_inflightRequests.TryGetValue(key, out var existing))
{
    return await existing; // Wait for in-flight request
}

var tcs = new TaskCompletionSource<byte[]>();
_inflightRequests[key] = tcs.Task;
try
{
    var tile = await FetchFromUpstream(origin, level, index);
    tcs.SetResult(tile);
    return tile;
}
finally
{
    _inflightRequests.Remove(key);
}

TUF Integration Point

When TufValidationEnabled is true:

  1. Load service map from TUF to discover Rekor URL
  2. Validate Rekor public key from TUF targets
  3. Verify checkpoint signatures using TUF-loaded keys
  4. Reject tiles if checkpoint signature invalid

Upstream Failover

Support multiple upstream sources with failover:

tile_proxy:
  upstreams:
    - url: https://rekor.sigstore.dev
      priority: 1
      timeout: 30s
    - url: https://rekor-mirror.internal
      priority: 2
      timeout: 10s

Failover behavior:

  1. Try primary upstream first
  2. On timeout/error, try next upstream
  3. Cache successful source for subsequent requests
  4. Reset failover state on explicit refresh

Deployment Model

Standalone Service

Run as dedicated service with persistent volume:

services:
  tile-proxy:
    image: stellaops/tile-proxy:latest
    ports:
      - "8090:8080"
    volumes:
      - tile-cache:/var/cache/stellaops/tiles
      - tuf-cache:/var/cache/stellaops/tuf
    environment:
      - TILE_PROXY__UPSTREAM_URL=https://rekor.sigstore.dev
      - TILE_PROXY__TUF_URL=https://trust.stella-ops.org/tuf/

Sidecar Mode

Run alongside attestor service:

services:
  attestor:
    image: stellaops/attestor:latest
    environment:
      - ATTESTOR__REKOR_URL=http://localhost:8090  # Use sidecar

  tile-proxy:
    image: stellaops/tile-proxy:latest
    network_mode: "service:attestor"

Metrics

Prometheus metrics exposed at /_admin/metrics:

Metric Type Description
tile_proxy_cache_hits_total Counter Total cache hits
tile_proxy_cache_misses_total Counter Total cache misses
tile_proxy_cache_size_bytes Gauge Current cache size
tile_proxy_upstream_requests_total Counter Upstream requests by status
tile_proxy_request_duration_seconds Histogram Request latency
tile_proxy_sync_last_success_timestamp Gauge Last successful sync time

Configuration

tile_proxy:
  # Upstream Rekor configuration
  upstream_url: https://rekor.sigstore.dev
  tile_base_url: https://rekor.sigstore.dev/tile/

  # TUF integration (optional)
  tuf:
    enabled: true
    url: https://trust.stella-ops.org/tuf/
    validate_checkpoint_signature: true

  # Cache configuration
  cache:
    base_path: /var/cache/stellaops/tiles
    max_size_gb: 10
    eviction_policy: lru
    checkpoint_ttl_minutes: 5

  # Sync job configuration
  sync:
    enabled: true
    schedule: "0 */6 * * *"
    depth: 10000

  # Request handling
  coalescing:
    enabled: true
    max_wait_ms: 5000

  # Failover
  failover:
    enabled: true
    retry_count: 2
    retry_delay_ms: 1000

Security Considerations

  1. No Authentication by Default: Designed for internal network use
  2. Optional mTLS: Can enable client certificate validation
  3. Rate Limiting: Optional rate limiting per client IP
  4. Audit Logging: Log all cache operations for compliance
  5. Immutable Tiles: Full tiles are never modified after caching

Error Handling

Scenario Behavior
Upstream unavailable Serve from cache if available; 503 otherwise
Invalid tile data Reject, don't cache, log error
Cache full Evict LRU tiles, continue serving
TUF validation fails Reject request, return 502
Checkpoint stale Refresh from upstream, warn in logs

Future Enhancements

  1. Tile Prefetching: Prefetch tiles for known verification patterns
  2. Multi-Log Support: Support multiple transparency logs
  3. Replication: Sync cache between proxy instances
  4. Compression: Optional tile compression for storage