fix tests. new product advisories enhancements
This commit is contained in:
262
docs/modules/attestor/tile-proxy-design.md
Normal file
262
docs/modules/attestor/tile-proxy-design.md
Normal file
@@ -0,0 +1,262 @@
|
||||
# Tile-Proxy Service Design
|
||||
|
||||
## Overview
|
||||
|
||||
The Tile-Proxy service acts as an intermediary between StellaOps clients and upstream Rekor transparency log APIs. It provides centralized tile caching, request coalescing, and offline support for air-gapped environments.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ CI/CD Agents │────►│ Tile Proxy │────►│ Rekor API │
|
||||
│ (StellaOps) │ │ (StellaOps) │ │ (Upstream) │
|
||||
└─────────────────┘ └────────┬────────┘ └─────────────────┘
|
||||
│
|
||||
┌───────────────────────┼───────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Tile Cache │ │ TUF Metadata │ │ Checkpoint │
|
||||
│ (CAS Store) │ │ (TrustRepo) │ │ Cache │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
## Core Responsibilities
|
||||
|
||||
1. **Tile Proxying**: Forward tile requests to upstream Rekor, caching responses locally
|
||||
2. **Content-Addressed Storage**: Store tiles by hash for deduplication and immutability
|
||||
3. **TUF Integration**: Optionally validate metadata using TUF trust anchors
|
||||
4. **Request Coalescing**: Deduplicate concurrent requests for the same tile
|
||||
5. **Checkpoint Caching**: Cache and serve recent checkpoints
|
||||
6. **Offline Mode**: Serve from cache when upstream is unavailable
|
||||
|
||||
## API Surface
|
||||
|
||||
### Proxy Endpoints (Passthrough)
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `GET /tile/{level}/{index}` | Proxy tile request (cache-through) |
|
||||
| `GET /tile/{level}/{index}.p/{partialWidth}` | Proxy partial tile |
|
||||
| `GET /checkpoint` | Proxy checkpoint request |
|
||||
| `GET /api/v1/log/entries/{uuid}` | Proxy entry lookup |
|
||||
|
||||
### Admin Endpoints
|
||||
|
||||
| Endpoint | Description |
|
||||
|----------|-------------|
|
||||
| `GET /_admin/cache/stats` | Cache statistics (hits, misses, size) |
|
||||
| `POST /_admin/cache/sync` | Trigger manual sync job |
|
||||
| `DELETE /_admin/cache/prune` | Prune old tiles |
|
||||
| `GET /_admin/health` | Health check |
|
||||
| `GET /_admin/ready` | Readiness check |
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
### Content-Addressed Tile Storage
|
||||
|
||||
Tiles are stored using content-addressed paths based on SHA-256 hash:
|
||||
|
||||
```
|
||||
{cache_root}/
|
||||
├── tiles/
|
||||
│ ├── {origin_hash}/
|
||||
│ │ ├── {level}/
|
||||
│ │ │ ├── {index}.tile
|
||||
│ │ │ └── {index}.meta.json
|
||||
│ │ └── checkpoints/
|
||||
│ │ └── {tree_size}.checkpoint
|
||||
│ └── ...
|
||||
└── metadata/
|
||||
└── cache_stats.json
|
||||
```
|
||||
|
||||
### Tile Metadata
|
||||
|
||||
Each tile has associated metadata:
|
||||
|
||||
```json
|
||||
{
|
||||
"cachedAt": "2026-01-25T10:00:00Z",
|
||||
"treeSize": 1050000,
|
||||
"isPartial": false,
|
||||
"contentHash": "sha256:abc123...",
|
||||
"upstreamUrl": "https://rekor.sigstore.dev"
|
||||
}
|
||||
```
|
||||
|
||||
### Eviction Policy
|
||||
|
||||
1. **LRU by Access Time**: Least recently accessed tiles evicted first
|
||||
2. **Max Size Limit**: Configurable maximum cache size
|
||||
3. **TTL Override**: Force re-fetch after configurable time (for checkpoints)
|
||||
4. **Immutability Preservation**: Full tiles (width=256) never evicted unless explicitly pruned
|
||||
|
||||
## Request Coalescing
|
||||
|
||||
Concurrent requests for the same tile are coalesced:
|
||||
|
||||
```csharp
|
||||
// Pseudo-code for request coalescing
|
||||
var key = $"{origin}/{level}/{index}";
|
||||
if (_inflightRequests.TryGetValue(key, out var existing))
|
||||
{
|
||||
return await existing; // Wait for in-flight request
|
||||
}
|
||||
|
||||
var tcs = new TaskCompletionSource<byte[]>();
|
||||
_inflightRequests[key] = tcs.Task;
|
||||
try
|
||||
{
|
||||
var tile = await FetchFromUpstream(origin, level, index);
|
||||
tcs.SetResult(tile);
|
||||
return tile;
|
||||
}
|
||||
finally
|
||||
{
|
||||
_inflightRequests.Remove(key);
|
||||
}
|
||||
```
|
||||
|
||||
## TUF Integration Point
|
||||
|
||||
When `TufValidationEnabled` is true:
|
||||
|
||||
1. Load service map from TUF to discover Rekor URL
|
||||
2. Validate Rekor public key from TUF targets
|
||||
3. Verify checkpoint signatures using TUF-loaded keys
|
||||
4. Reject tiles if checkpoint signature invalid
|
||||
|
||||
## Upstream Failover
|
||||
|
||||
Support multiple upstream sources with failover:
|
||||
|
||||
```yaml
|
||||
tile_proxy:
|
||||
upstreams:
|
||||
- url: https://rekor.sigstore.dev
|
||||
priority: 1
|
||||
timeout: 30s
|
||||
- url: https://rekor-mirror.internal
|
||||
priority: 2
|
||||
timeout: 10s
|
||||
```
|
||||
|
||||
Failover behavior:
|
||||
1. Try primary upstream first
|
||||
2. On timeout/error, try next upstream
|
||||
3. Cache successful source for subsequent requests
|
||||
4. Reset failover state on explicit refresh
|
||||
|
||||
## Deployment Model
|
||||
|
||||
### Standalone Service
|
||||
|
||||
Run as dedicated service with persistent volume:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
tile-proxy:
|
||||
image: stellaops/tile-proxy:latest
|
||||
ports:
|
||||
- "8090:8080"
|
||||
volumes:
|
||||
- tile-cache:/var/cache/stellaops/tiles
|
||||
- tuf-cache:/var/cache/stellaops/tuf
|
||||
environment:
|
||||
- TILE_PROXY__UPSTREAM_URL=https://rekor.sigstore.dev
|
||||
- TILE_PROXY__TUF_URL=https://trust.stella-ops.org/tuf/
|
||||
```
|
||||
|
||||
### Sidecar Mode
|
||||
|
||||
Run alongside attestor service:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
attestor:
|
||||
image: stellaops/attestor:latest
|
||||
environment:
|
||||
- ATTESTOR__REKOR_URL=http://localhost:8090 # Use sidecar
|
||||
|
||||
tile-proxy:
|
||||
image: stellaops/tile-proxy:latest
|
||||
network_mode: "service:attestor"
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
Prometheus metrics exposed at `/_admin/metrics`:
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `tile_proxy_cache_hits_total` | Counter | Total cache hits |
|
||||
| `tile_proxy_cache_misses_total` | Counter | Total cache misses |
|
||||
| `tile_proxy_cache_size_bytes` | Gauge | Current cache size |
|
||||
| `tile_proxy_upstream_requests_total` | Counter | Upstream requests by status |
|
||||
| `tile_proxy_request_duration_seconds` | Histogram | Request latency |
|
||||
| `tile_proxy_sync_last_success_timestamp` | Gauge | Last successful sync time |
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
tile_proxy:
|
||||
# Upstream Rekor configuration
|
||||
upstream_url: https://rekor.sigstore.dev
|
||||
tile_base_url: https://rekor.sigstore.dev/tile/
|
||||
|
||||
# TUF integration (optional)
|
||||
tuf:
|
||||
enabled: true
|
||||
url: https://trust.stella-ops.org/tuf/
|
||||
validate_checkpoint_signature: true
|
||||
|
||||
# Cache configuration
|
||||
cache:
|
||||
base_path: /var/cache/stellaops/tiles
|
||||
max_size_gb: 10
|
||||
eviction_policy: lru
|
||||
checkpoint_ttl_minutes: 5
|
||||
|
||||
# Sync job configuration
|
||||
sync:
|
||||
enabled: true
|
||||
schedule: "0 */6 * * *"
|
||||
depth: 10000
|
||||
|
||||
# Request handling
|
||||
coalescing:
|
||||
enabled: true
|
||||
max_wait_ms: 5000
|
||||
|
||||
# Failover
|
||||
failover:
|
||||
enabled: true
|
||||
retry_count: 2
|
||||
retry_delay_ms: 1000
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **No Authentication by Default**: Designed for internal network use
|
||||
2. **Optional mTLS**: Can enable client certificate validation
|
||||
3. **Rate Limiting**: Optional rate limiting per client IP
|
||||
4. **Audit Logging**: Log all cache operations for compliance
|
||||
5. **Immutable Tiles**: Full tiles are never modified after caching
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| Upstream unavailable | Serve from cache if available; 503 otherwise |
|
||||
| Invalid tile data | Reject, don't cache, log error |
|
||||
| Cache full | Evict LRU tiles, continue serving |
|
||||
| TUF validation fails | Reject request, return 502 |
|
||||
| Checkpoint stale | Refresh from upstream, warn in logs |
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Tile Prefetching**: Prefetch tiles for known verification patterns
|
||||
2. **Multi-Log Support**: Support multiple transparency logs
|
||||
3. **Replication**: Sync cache between proxy instances
|
||||
4. **Compression**: Optional tile compression for storage
|
||||
Reference in New Issue
Block a user