7.9 KiB
7.9 KiB
Tile-Proxy Service Design
Overview
The Tile-Proxy service acts as an intermediary between StellaOps clients and upstream Rekor transparency log APIs. It provides centralized tile caching, request coalescing, and offline support for air-gapped environments.
Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CI/CD Agents │────►│ Tile Proxy │────►│ Rekor API │
│ (StellaOps) │ │ (StellaOps) │ │ (Upstream) │
└─────────────────┘ └────────┬────────┘ └─────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Tile Cache │ │ TUF Metadata │ │ Checkpoint │
│ (CAS Store) │ │ (TrustRepo) │ │ Cache │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Core Responsibilities
- Tile Proxying: Forward tile requests to upstream Rekor, caching responses locally
- Content-Addressed Storage: Store tiles by hash for deduplication and immutability
- TUF Integration: Optionally validate metadata using TUF trust anchors
- Request Coalescing: Deduplicate concurrent requests for the same tile
- Checkpoint Caching: Cache and serve recent checkpoints
- Offline Mode: Serve from cache when upstream is unavailable
API Surface
Proxy Endpoints (Passthrough)
| Endpoint | Description |
|---|---|
GET /tile/{level}/{index} |
Proxy tile request (cache-through) |
GET /tile/{level}/{index}.p/{partialWidth} |
Proxy partial tile |
GET /checkpoint |
Proxy checkpoint request |
GET /api/v1/log/entries/{uuid} |
Proxy entry lookup |
Admin Endpoints
| Endpoint | Description |
|---|---|
GET /_admin/cache/stats |
Cache statistics (hits, misses, size) |
POST /_admin/cache/sync |
Trigger manual sync job |
DELETE /_admin/cache/prune |
Prune old tiles |
GET /_admin/health |
Health check |
GET /_admin/ready |
Readiness check |
Caching Strategy
Content-Addressed Tile Storage
Tiles are stored using content-addressed paths based on SHA-256 hash:
{cache_root}/
├── tiles/
│ ├── {origin_hash}/
│ │ ├── {level}/
│ │ │ ├── {index}.tile
│ │ │ └── {index}.meta.json
│ │ └── checkpoints/
│ │ └── {tree_size}.checkpoint
│ └── ...
└── metadata/
└── cache_stats.json
Tile Metadata
Each tile has associated metadata:
{
"cachedAt": "2026-01-25T10:00:00Z",
"treeSize": 1050000,
"isPartial": false,
"contentHash": "sha256:abc123...",
"upstreamUrl": "https://rekor.sigstore.dev"
}
Eviction Policy
- LRU by Access Time: Least recently accessed tiles evicted first
- Max Size Limit: Configurable maximum cache size
- TTL Override: Force re-fetch after configurable time (for checkpoints)
- Immutability Preservation: Full tiles (width=256) never evicted unless explicitly pruned
Request Coalescing
Concurrent requests for the same tile are coalesced:
// Pseudo-code for request coalescing
var key = $"{origin}/{level}/{index}";
if (_inflightRequests.TryGetValue(key, out var existing))
{
return await existing; // Wait for in-flight request
}
var tcs = new TaskCompletionSource<byte[]>();
_inflightRequests[key] = tcs.Task;
try
{
var tile = await FetchFromUpstream(origin, level, index);
tcs.SetResult(tile);
return tile;
}
finally
{
_inflightRequests.Remove(key);
}
TUF Integration Point
When TufValidationEnabled is true:
- Load service map from TUF to discover Rekor URL
- Validate Rekor public key from TUF targets
- Verify checkpoint signatures using TUF-loaded keys
- Reject tiles if checkpoint signature invalid
Upstream Failover
Support multiple upstream sources with failover:
tile_proxy:
upstreams:
- url: https://rekor.sigstore.dev
priority: 1
timeout: 30s
- url: https://rekor-mirror.internal
priority: 2
timeout: 10s
Failover behavior:
- Try primary upstream first
- On timeout/error, try next upstream
- Cache successful source for subsequent requests
- Reset failover state on explicit refresh
Deployment Model
Standalone Service
Run as dedicated service with persistent volume:
services:
tile-proxy:
image: stellaops/tile-proxy:latest
ports:
- "8090:8080"
volumes:
- tile-cache:/var/cache/stellaops/tiles
- tuf-cache:/var/cache/stellaops/tuf
environment:
- TILE_PROXY__UPSTREAM_URL=https://rekor.sigstore.dev
- TILE_PROXY__TUF_URL=https://trust.stella-ops.org/tuf/
Sidecar Mode
Run alongside attestor service:
services:
attestor:
image: stellaops/attestor:latest
environment:
- ATTESTOR__REKOR_URL=http://localhost:8090 # Use sidecar
tile-proxy:
image: stellaops/tile-proxy:latest
network_mode: "service:attestor"
Metrics
Prometheus metrics exposed at /_admin/metrics:
| Metric | Type | Description |
|---|---|---|
tile_proxy_cache_hits_total |
Counter | Total cache hits |
tile_proxy_cache_misses_total |
Counter | Total cache misses |
tile_proxy_cache_size_bytes |
Gauge | Current cache size |
tile_proxy_upstream_requests_total |
Counter | Upstream requests by status |
tile_proxy_request_duration_seconds |
Histogram | Request latency |
tile_proxy_sync_last_success_timestamp |
Gauge | Last successful sync time |
Configuration
tile_proxy:
# Upstream Rekor configuration
upstream_url: https://rekor.sigstore.dev
tile_base_url: https://rekor.sigstore.dev/tile/
# TUF integration (optional)
tuf:
enabled: true
url: https://trust.stella-ops.org/tuf/
validate_checkpoint_signature: true
# Cache configuration
cache:
base_path: /var/cache/stellaops/tiles
max_size_gb: 10
eviction_policy: lru
checkpoint_ttl_minutes: 5
# Sync job configuration
sync:
enabled: true
schedule: "0 */6 * * *"
depth: 10000
# Request handling
coalescing:
enabled: true
max_wait_ms: 5000
# Failover
failover:
enabled: true
retry_count: 2
retry_delay_ms: 1000
Security Considerations
- No Authentication by Default: Designed for internal network use
- Optional mTLS: Can enable client certificate validation
- Rate Limiting: Optional rate limiting per client IP
- Audit Logging: Log all cache operations for compliance
- Immutable Tiles: Full tiles are never modified after caching
Error Handling
| Scenario | Behavior |
|---|---|
| Upstream unavailable | Serve from cache if available; 503 otherwise |
| Invalid tile data | Reject, don't cache, log error |
| Cache full | Evict LRU tiles, continue serving |
| TUF validation fails | Reject request, return 502 |
| Checkpoint stale | Refresh from upstream, warn in logs |
Future Enhancements
- Tile Prefetching: Prefetch tiles for known verification patterns
- Multi-Log Support: Support multiple transparency logs
- Replication: Sync cache between proxy instances
- Compression: Optional tile compression for storage