# Time-to-First-Signal (TTFS) Architecture > Derived from Product Advisory (14-Dec-2025): UX and Time-to-Evidence Technical Reference; details the TTFS subsystem for providing immediate feedback on run/job status. ## 1) Overview Time-to-First-Signal (TTFS) measures the latency from user action (opening a run, starting a scan, CLI invocation) to the first meaningful signal being displayed or logged. This architecture ensures users receive immediate feedback regardless of actual job completion time. ### 1.1 Design Goals - **Instant Feedback:** P50 < 2s, P95 < 5s across all surfaces (UI, CLI, CI) - **Graceful Degradation:** Skeleton → Cached Signal → Live Data progression - **Offline-First:** Full functionality in air-gapped environments using PostgreSQL NOTIFY/LISTEN - **Predictive Context:** Provide "last known outcome" and ETA estimates for in-progress jobs ### 1.2 Signal Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ TTFS Signal Flow │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ User Action API Layer Cache Layer Data Layer │ │ ─────────── ───────── ─────────── ────────── │ │ │ │ [Route Enter] ──┬──► /first-signal ───────► Valkey/Redis ─┐ │ │ [CLI Start] ───┤ │ │ │ │ │ [CI Job] ───┘ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ ▼ │ │ PostgreSQL │ │ │ ┌──────────┐ │ │ first_signal │ │ │ │ ETag │◄────────────────┤ │ _snapshots │ │ │ │ Validation│ │ └──────────────┘ │ │ └──────────┘ │ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────────────────┐ │ │ │ Response Assembly │ │ │ │ • kind (status indicator) │ │ │ │ • phase (current stage) │ │ │ │ • summary (human text) │ │ │ │ • eta_seconds (estimate) │ │ │ │ • last_known_outcome │ │ │ │ • next_actions │ │ │ └──────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────┐ │ │ │ SSE / Polling Client │ │ │ └──────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## 2) Component Budgets The 5-second P95 budget is allocated across components: | Component | P50 Budget | P95 Budget | Notes | |-----------|------------|------------|-------| | Frontend (skeleton + hydration) | 100ms | 150ms | Network-independent | | Edge API (auth + routing) | 150ms | 250ms | JWT validation, rate limiting | | Core Services (lookup + assembly) | 700ms | 1,500ms | Cache hit vs cold path | | SSE/WebSocket establishment | — | 300ms | Fallback to polling if exceeded | | **Total (warm path)** | **700ms** | **2,500ms** | Cache hit scenario | | **Total (cold path)** | **1,200ms** | **4,000ms** | Cache miss, compute required | ## 3) Signal Kinds The `kind` field indicates the current signal state: | Kind | Description | Typical Duration | Icon | |------|-------------|------------------|------| | `queued` | Job waiting in queue | 0-30s | Queue | | `started` | Job has begun execution | — | Play | | `phase` | Job in specific phase | Varies | Progress | | `blocked` | Waiting on dependency/policy | — | Pause | | `failed` | Job has failed | — | Error | | `succeeded` | Job completed successfully | — | Check | | `canceled` | Job was canceled | — | Cancel | | `unavailable` | Signal cannot be determined | — | Unknown | ## 4) Signal Phases The `phase` field indicates the current execution phase: | Phase | Description | SLO Target | |-------|-------------|------------| | `resolve` | Dependency/artifact resolution | P95 < 30s | | `fetch` | Data retrieval (registry, advisories) | P95 < 45s | | `restore` | Cache/snapshot restoration | P95 < 10s | | `analyze` | Analysis execution (scan, policy) | P95 < 120s | | `policy` | Policy evaluation | P95 < 15s | | `report` | Report generation/upload | P95 < 30s | | `unknown` | Phase cannot be determined | — | ## 5) API Contracts ### 5.1 First Signal Endpoint ```http GET /api/v1/orchestrator/jobs/{jobId}/first-signal Accept: application/json If-None-Match: "{etag}" 200 OK ETag: "job-{id}-{updated_at.unix_ms}" Cache-Control: private, max-age=1, stale-while-revalidate=5 X-Signal-Source: snapshot | cold_start | failure_index { "kind": "started", "phase": "analyze", "summary": "Scanning image layers (47%)", "eta_seconds": 38, "last_known_outcome": { "status": "succeeded", "finished_at": "2025-12-13T10:15:00Z", "findings_count": 12 }, "next_actions": [ {"label": "View previous run", "href": "/runs/abc-123"} ], "diagnostics": { "queue_position": null, "worker_id": "worker-7" } } 304 Not Modified (if ETag matches) ``` ### 5.2 SSE Stream ```http GET /api/v1/orchestrator/stream/jobs/{jobId}/first-signal Accept: text/event-stream event: signal data: {"kind":"started","phase":"analyze",...} event: signal data: {"kind":"phase","phase":"policy",...} event: done data: {"kind":"succeeded",...} ``` ### 5.3 CLI Integration ```bash # Job status with immediate signal stella job status --watch # Output progression: # [queued] Waiting in queue (position: 3) # [started] Job started on worker-7 # [phase:analyze] Scanning image layers (47%) # [succeeded] Completed in 2m 34s ``` ## 6) Caching Strategy ### 6.1 Cache Tiers | Tier | Storage | TTL | Use Case | |------|---------|-----|----------| | L1 | In-memory (per-instance) | 1s | Hot path, same-instance requests | | L2 | Valkey/Redis | 5s | Cross-instance, active jobs | | L3 | PostgreSQL | 24h | Persistent snapshots, air-gap mode | ### 6.2 Cache Keys ``` ttfs:job:{tenant_id}:{job_id}:signal # Current signal ttfs:job:{tenant_id}:{job_id}:eta # ETA prediction ttfs:run:{tenant_id}:{run_id}:signals # Run-level aggregation ttfs:tenant:{tenant_id}:failure_sig # Failure signatures ``` ### 6.3 Air-Gap Mode In air-gapped environments without Valkey/Redis: 1. **PostgreSQL NOTIFY/LISTEN** replaces pub/sub for real-time updates 2. **Polling fallback** with 2-second intervals 3. **first_signal_snapshots** table serves as L2 cache 4. All SSE endpoints gracefully degrade to long-polling ## 7) Telemetry & Observability ### 7.1 Metrics | Metric | Type | Description | |--------|------|-------------| | `ttfs_latency_seconds` | Histogram | End-to-end signal latency | | `ttfs_cache_latency_seconds` | Histogram | Cache lookup time | | `ttfs_cold_latency_seconds` | Histogram | Cold path computation time | | `ttfs_signal_total` | Counter | Signals by kind/surface | | `ttfs_cache_hit_total` | Counter | Cache hits | | `ttfs_cache_miss_total` | Counter | Cache misses | | `ttfs_slo_breach_total` | Counter | SLO breaches | | `ttfs_error_total` | Counter | Errors by type | ### 7.2 Labels All metrics include the following labels: - `surface`: `ui` | `cli` | `ci` - `cache_hit`: `true` | `false` - `signal_source`: `snapshot` | `cold_start` | `failure_index` - `kind`: Signal kind enum - `tenant_id`: Tenant identifier (for multi-tenant deployments) ### 7.3 SLO Definitions ```yaml # Prometheus recording rules - record: ttfs:slo:p50_target expr: 2.0 # seconds - record: ttfs:slo:p95_target expr: 5.0 # seconds - record: ttfs:slo:compliance expr: | histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le)) < 5.0 # Alerting rules - alert: TtfsSloBreachP95 expr: histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le)) > 5.0 for: 5m labels: severity: page annotations: summary: "TTFS P95 exceeds 5s SLO" - alert: TtfsHighErrorRate expr: rate(ttfs_error_total[5m]) > 0.1 for: 2m labels: severity: warning ``` ## 8) Frontend Integration ### 8.1 Component Hierarchy ``` FirstSignalCard (Smart Component) ├── FirstSignalStore (Signal-based State) │ ├── SSE subscription │ ├── Polling fallback │ └── ETag caching ├── StatusIndicator (Dumb Component) │ └── kind → icon + color mapping ├── PhaseProgress (Dumb Component) │ └── phase → progress bar └── ActionButtons (Dumb Component) └── next_actions rendering ``` ### 8.2 State Machine ```typescript type FirstSignalLoadState = 'idle' | 'loading' | 'streaming' | 'error' | 'done'; // State transitions: // idle → loading (initial fetch) // loading → streaming (SSE connected) | error (fetch failed) // streaming → done (terminal signal) | error (connection lost) // error → loading (retry) ``` ### 8.3 Animation Tokens | Token | Value | Usage | |-------|-------|-------| | `--motion-duration-quick` | 150ms | Skeleton fade, icon transitions | | `--motion-duration-normal` | 250ms | Card expansion, phase transitions | | `--motion-duration-slow` | 400ms | Success/failure celebrations | | `--motion-easing-standard` | cubic-bezier(0.4, 0, 0.2, 1) | Default easing | | `--motion-easing-decelerate` | cubic-bezier(0, 0, 0.2, 1) | Entries | | `--motion-easing-accelerate` | cubic-bezier(0.4, 0, 1, 1) | Exits | ## 9) Failure Signatures Failure signatures enable predictive "last known outcome" by pattern-matching historical failures. ### 9.1 Signature Schema ```json { "signature_hash": "sha256:abc123...", "pattern": { "phase": "analyze", "error_code": "LAYER_EXTRACT_FAILED", "image_pattern": "registry.io/.*:v1.*" }, "outcome": { "likely_cause": "Registry rate limiting", "mttr_p50_seconds": 300, "suggested_action": "Wait 5 minutes and retry" }, "confidence": 0.87, "sample_count": 42 } ``` ### 9.2 Usage When a job enters a known failure pattern: 1. **Match** current job state against `failure_signatures` table 2. **Enrich** signal with `last_known_outcome.likely_cause` 3. **Predict** ETA based on historical MTTR 4. **Suggest** remediation via `next_actions` ## 10) Database Schema See `docs/db/schemas/ttfs.sql` for the complete schema definition. ### 10.1 Core Tables | Table | Purpose | |-------|---------| | `scheduler.first_signal_snapshots` | Cached signal state per job | | `scheduler.ttfs_events` | Telemetry event log | | `scheduler.failure_signatures` | Historical failure patterns | ### 10.2 Hourly Rollup View The `scheduler.ttfs_hourly_summary` view provides pre-aggregated metrics for dashboard performance. ## 11) Testing Requirements ### 11.1 Unit Tests - Signal store state machine transitions - ETag generation and validation - Cache hit/miss scenarios - Failure signature matching ### 11.2 Integration Tests - End-to-end API latency measurement - SSE connection lifecycle - Air-gap mode fallback - Multi-tenant isolation ### 11.3 Deterministic Fixtures ```typescript // tests/fixtures/ttfs/ export const TTFS_FIXTURES = { FROZEN_TIMESTAMP: '2025-12-04T12:00:00.000Z', DETERMINISTIC_SEED: 0x5EED2025, SAMPLE_JOB_ID: '550e8400-e29b-41d4-a716-446655440000', SAMPLE_TENANT_ID: 'tenant-test-001' }; ``` ## 12) Observability ### 12.1 Grafana Dashboard The TTFS observability dashboard provides real-time visibility into signal latency, cache performance, and SLO compliance. - **Dashboard file**: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json` - **UID**: `ttfs-overview` **Key panels:** - TTFS P50/P95/P99 by Surface (timeseries) - Cache Hit Rate (stat) - SLO Breaches (stat with threshold coloring) - Signal Source Distribution (piechart) - Signals by Kind (stacked timeseries) - Error Rate (timeseries) - TTFS Latency Heatmap - Top Failure Signatures (table) ### 12.2 Alert Rules TTFS alerts are defined in `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`. **Critical alerts:** | Alert | Threshold | For | |-------|-----------|-----| | `TtfsP95High` | P95 > 5s | 5m | | `TtfsSloBreach` | >10 breaches in 5m | 1m | | `FirstSignalEndpointDown` | Orchestrator unavailable | 2m | **Warning alerts:** | Alert | Threshold | For | |-------|-----------|-----| | `TtfsCacheHitRateLow` | <70% | 10m | | `TtfsErrorRateHigh` | >1% | 5m | | `FirstSignalEndpointLatencyHigh` | P95 > 500ms | 5m | ### 12.3 Load Testing Load tests validate TTFS performance under realistic conditions. - **Test file**: `tests/load/ttfs-load-test.js` - **Framework**: k6 **Scenarios:** - Sustained: 50 RPS for 5 minutes - Spike: Ramp to 200 RPS - Soak: 25 RPS for 15 minutes **Thresholds:** - Cache-hit P95 ≤ 250ms - Cold-path P95 ≤ 500ms - Error rate < 0.1% ## 13) References - Advisory: `docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md` - Sprint 1 (Foundation): `docs/implplan/SPRINT_0338_0001_0001_ttfs_foundation.md` - Sprint 2 (API): `docs/implplan/SPRINT_0339_0001_0001_first_signal_api.md` - Sprint 3 (UI): `docs/implplan/SPRINT_0340_0001_0001_first_signal_card_ui.md` - Sprint 4 (Enhancements): `docs/implplan/SPRINT_0341_0001_0001_ttfs_enhancements.md` - TTE Architecture: `docs/modules/telemetry/architecture.md` - Telemetry Schema: `docs/schemas/ttfs-event.schema.json` - Database Schema: `docs/db/schemas/ttfs.sql` - Grafana Dashboard: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json` - Alert Rules: `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml` - Load Tests: `tests/load/ttfs-load-test.js`