Files
git.stella-ops.org/docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md
2025-12-14 21:29:44 +02:00

19 KiB
Raw Blame History

UX and Time-to-Evidence Technical Reference

Source Advisories:

  • 01-Dec-2025 - Tracking UX Health with TimetoEvidence
  • 12-Dec-2025 - Measure UX Efficiency Through TTFS
  • 13-Dec-2025 - Define a north star metric for TTFS
  • 14-Dec-2025 - Add a dedicated "first_signal" event
  • 04-Dec-2025 - Designing Traceable Evidence in Security UX
  • 05-Dec-2025 - Designing Triage UX That Stays Quiet on Purpose
  • 30-Nov-2025 - UI Micro-Interactions for StellaOps
  • 11-Dec-2025 - Stella DevOps UX Implementation Guide

Last Updated: 2025-12-14


1. PERFORMANCE TARGETS & SLOS

1.1 Time-to-Evidence (TTE)

Definition: TTE = t_first_proof_rendered t_open_finding

Primary SLO: P95 ≤ 15s (stretch: P99 ≤ 30s)

Guardrail: P50 < 3s

By proof type:

  • Simple proof (SBOM row): P95 ≤ 5s
  • Complex proof (reachability graph): P95 ≤ 15s

Backend budget: 12s backend + 3s UI/render margin = 15s P95

Query performance: O(log n) on indexed columns

1.2 Time-to-First-Signal (TTFS)

Definition: Time from user action/CI start → first meaningful signal rendered/logged

Primary SLO: P50 < 2s, P95 < 5s (all surfaces: UI, CLI, CI)

Warm path: P50 < 700ms, P95 < 2500ms

Cold path: P95 ≤ 4000ms

Component budgets:

  • Frontend: ≤150ms (skeleton + last known state)
  • Edge/API: ≤250ms (signal frame fast path from cache)
  • Core services: ≤5001500ms (pre-indexed failures, warm summaries)
  • Slow work: async (scan, lattice merge, provenance)

1.3 General UX Performance

  • Interaction response: ≤100ms
  • Animation frame budget: 16ms avg / 50ms P95
  • LCP placeholder: shown immediately
  • Layout shift: <0.05
  • Motion durations: 80/140/200/260/320ms
  • Reduced-motion: 0-80ms clamp

1.4 Cache Performance

  • Cache-hit response: P95 ≤ 250ms
  • Cold response: P95 ≤ 500ms
  • Endpoint error rate: < 0.1% under expected concurrency

2. METRICS DEFINITIONS & FORMULAS

2.1 TTE Metrics

// Core TTE calculation
tte_ms = proof_rendered.timestamp - finding_open.timestamp

// Dimensions
{
  tenant: string,
  finding_id: string,
  proof_kind: 'sbom' | 'reachability' | 'vex',
  source: 'local' | 'remote' | 'cache',
  page: string
}

SQL Rollup (hourly):

SELECT
  proof_kind,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
FROM tte_events
WHERE ts >= now() - interval '1 hour'
GROUP BY proof_kind;

2.2 TTFS Metrics

// Core TTFS calculation
ttfs_ms = signal_rendered.timestamp - start.timestamp

// Dimensions
{
  surface: 'ui' | 'cli' | 'ci',
  cache_hit: boolean,
  signal_source: 'snapshot' | 'cold_start' | 'failure_index',
  kind: string,
  repo_size_bucket: string,
  provider: string,
  branch: string,
  run_type: 'PR' | 'main',
  network_state: string
}

2.3 Secondary Metrics

  • Open→Action time: Time from opening run to first user action
  • Bounce rate: Close page within 10s without interaction
  • MTTR proxy: Time from failure to first rerun or fix commit
  • Signal availability rate: % of run views showing first signal within 3s
  • Signal accuracy score: Engineer confirms "helpful vs not" (sampled)
  • Extractor failure rate: Parsing errors / missing mappings / timeouts

2.4 DORA Metrics

  • Deployment Frequency: Deploys per day/week
  • Lead Time for Changes: Commit → deployment completion
  • Change Failure Rate: Failed deployments / total deployments
  • Time to Restore: Incident start → resolution

2.5 Quality Metrics

  • Error budget burn: Minutes over target per day
  • Top regressions: Last 7 days vs prior 7
  • Extraction failure rate: < 1% for sampled runs

3. EVENT SCHEMAS

3.1 TTE Events

finding_open:

{
  event: 'finding_open',
  findingId: string,
  tenantId: string,
  userId: string,
  userRole: 'admin' | 'dev' | 'triager',
  entryPoint: 'list' | 'search' | 'notification' | 'deep_link',
  uiVersion: string,
  buildSha: string,
  t: number // performance.now()
}

proof_rendered:

{
  event: 'proof_rendered',
  findingId: string,
  proofKind: 'sbom' | 'reachability' | 'vex' | 'logs' | 'other',
  source: 'local_cache' | 'backend_api' | '3rd_party',
  proofHeight: number, // pixel offset from top
  t: number // performance.now()
}

3.2 TTFS Events

ttfs_start:

{
  event: 'ttfs_start',
  runId: string,
  surface: 'ui' | 'cli' | 'ci',
  provider: string,
  repo: string,
  branch: string,
  runType: 'PR' | 'main',
  device: string,
  release: string,
  networkState: string,
  t: number
}

ttfs_signal_rendered:

{
  event: 'ttfs_signal_rendered',
  runId: string,
  surface: 'ui' | 'cli' | 'ci',
  cacheHit: boolean,
  signalSource: 'snapshot' | 'cold_start' | 'failure_index',
  kind: string,
  t: number
}

3.3 FirstSignal Event Contract

interface FirstSignal {
  version: '1.0',
  signalId: string,
  jobId: string,
  timestamp: string, // ISO-8601
  kind: 'queued' | 'started' | 'phase' | 'blocked' | 'failed' | 'succeeded' | 'canceled' | 'unavailable',
  phase: 'resolve' | 'fetch' | 'restore' | 'analyze' | 'policy' | 'report' | 'unknown',
  scope: {
    type: 'repo' | 'image' | 'artifact',
    id: string
  },
  summary: string,
  etaSeconds?: number,
  lastKnownOutcome?: {
    signatureId: string,
    errorCode: string,
    token: string,
    excerpt: string,
    confidence: 'low' | 'medium' | 'high',
    firstSeenAt: string,
    hitCount: number
  },
  nextActions?: Array<{
    type: 'open_logs' | 'open_job' | 'docs' | 'retry' | 'cli_command',
    label: string,
    target: string
  }>,
  diagnostics: {
    cacheHit: boolean,
    source: 'snapshot' | 'failure_index' | 'cold_start',
    correlationId: string
  }
}

3.4 UI Telemetry Schema

ui.micro. events*:

{
  version: string,
  tenant: string,
  surface: string,
  component: string,
  action: string,
  latency_ms: number,
  outcome: string,
  reduced_motion: boolean,
  offline_mode: boolean,
  error_code?: string
}

Schema location: docs/modules/ui/telemetry/ui-micro.schema.json

4. API CONTRACTS

4.1 First Signal Endpoint

GET /api/runs/{runId}/first-signal

Headers:

  • If-None-Match: W/"..." (supported)

Response:

{
  "runId": "123",
  "firstSignal": {
    "type": "stage_failed",
    "stage": "build",
    "step": "dotnet restore",
    "message": "401 Unauthorized: token expired",
    "at": "2025-12-11T09:22:31Z",
    "artifact": {
      "kind": "log",
      "range": { "start": 1880, "end": 1896 }
    }
  },
  "summaryEtag": "W/\"a1b2c3\""
}

Status codes:

  • 200: Full first signal object
  • 304: Not modified
  • 404: Run not found
  • 204: Run exists but signal not available yet

Response headers:

  • ETag
  • Cache-Control
  • X-Correlation-Id
  • Cache-Status: hit|miss|bypass

4.2 Summary Endpoint

GET /api/runs/{runId}/summary

Returns: Status, first failing stage/job, timestamps, blocking policies, artifact counts

4.3 SSE Events Endpoint

GET /api/runs/{runId}/events (Server-Sent Events)

Event payloads:

  • status (kind+phase+message)
  • hint (token+errorCode+confidence)
  • policy (blocked + policyId)
  • complete (terminal)

5. FRONTEND PATTERNS & COMPONENT SPECIFICATIONS

5.1 UI Contract (Evidence First)

Above the fold:

  • Always show compact Proof panel first (not hidden behind tabs)
  • Skeletons over spinners: Reserve space; render partial proof as ready
  • Plain text copy affordance: "Copy SBOM line / path" button next to proof
  • Defer non-proof widgets: CVSS badges, remediation prose, charts load after proof
  • Empty-state truth: "No proof available yet" + loader for that proof type only

5.2 Progressive Rendering Pattern

Immediate render:

  1. Title, status badge, pipeline metadata (run id, commit, branch)
  2. Skeleton for details area

First signal fetch: 3. Render FirstSignalCard immediately when available 4. Fire telemetry event when card is in DOM and visible

Lazy-load: 5. Stage graph 6. Full logs viewer 7. Artifacts list 8. Security findings 9. Trends, flaky tests, etc.

5.3 Component Specifications

FirstSignalCard Component

  • Standalone, minimal dependencies
  • Shows: summary + at least one next action button (Open job/logs)
  • Updates in-place on deltas from SSE
  • Falls back to polling when SSE fails

EvidencePanel Component

interface EvidencePanel {
  tabs: ['SBOM', 'Reachability', 'VEX', 'Logs', 'Other'],
  firstProofType: ProofKind,
  copyEnabled: boolean,
  emptyStateMessage?: string
}

ProofSpine Component

  • Displays: graphRevisionId, bundle hashes (SBOM/VEX/proof), receipt digest, and Rekor details (when present)
  • Copy affordances: copy graphRevisionId, proofBundleId, and receipt digest in one click
  • Verification status: Verified | Unverified | Failed verification | Expired/Outdated
  • "Verify locally" copy button with exact commands

5.4 Prefetch Strategy

From runs list view:

  • Use IntersectionObserver to prefetch summaries/first signals for items in viewport
  • Store results in in-memory cache (Map<runId, FirstSignal>)
  • Respect ETag to avoid redundant payloads

6. TELEMETRY REQUIREMENTS

6.1 Client-Side Telemetry

Frontend events:

// On route enter
metrics.emit('finding_open', { findingId, t: performance.now() });

// When first proof node/line hits DOM
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });

Sampling:

  • Staging: 100%
  • Production: ≥25% of sessions (ideal: 100%)

Clock handling:

  • Use performance.now() for TTE (monotonic within tab)
  • Don't mix backend clocks into TTE calculation

6.2 Backend Telemetry

Endpoint metrics:

  • signal_endpoint_latency_ms
  • signal_payload_bytes
  • signal_error_rate

Server-side timing logs (debug-level):

  • Cache read time
  • DB read time
  • Cold path time

Tracing:

  • Correlation ID propagated in:
    • API response header
    • Worker logs
    • Events

6.3 Dashboard Requirements

Core widgets:

  1. TTE distribution (P50/P90/P95/P99) per day, split by proof_kind
  2. TTE by page/surface (list→detail, deep links, bookmarks)
  3. TTE by user segment (new vs power users, roles)
  4. Error budget: "Minutes over SLO per day"
  5. Correlation: TTE vs session length, TTE vs "clicked ignore/snooze"

Operational panels:

  • Update granularity: Real-time or ≤15 min
  • Retention: ≥90 days
  • Breakdowns: backend_region, build_version

TTFS dashboards:

  • By surface (ui/cli/ci)
  • Cache hit rate
  • Endpoint latency percentiles
  • Repo size bucket
  • Kind/phase

Alerts:

  • Page when p95(ttfs_ms) > 5000 for 5 mins
  • Page when signal_endpoint_error_rate > 1%
  • Alert when P95 TTE > 15s for 15 minutes

7. DATABASE SCHEMAS

7.1 TTE Events Table

CREATE TABLE tte_events (
  id SERIAL PRIMARY KEY,
  ts TIMESTAMPTZ NOT NULL DEFAULT now(),
  tenant TEXT NOT NULL,
  finding_id TEXT NOT NULL,
  proof_kind TEXT NOT NULL,
  source TEXT NOT NULL,
  tte_ms INT NOT NULL,
  page TEXT,
  user_role TEXT
);

CREATE INDEX ON tte_events (ts DESC);
CREATE INDEX ON tte_events (proof_kind, ts DESC);

7.2 First Signal Snapshots

CREATE TABLE first_signal_snapshots (
  job_id TEXT PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  kind TEXT NOT NULL,
  phase TEXT NOT NULL,
  summary TEXT NOT NULL,
  eta_seconds INT NULL,
  payload_json JSONB NOT NULL
);

CREATE INDEX ON first_signal_snapshots (updated_at DESC);

7.3 Failure Signatures

CREATE TABLE failure_signatures (
  signature_id TEXT PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  scope_type TEXT NOT NULL,
  scope_id TEXT NOT NULL,
  toolchain_hash TEXT NOT NULL,
  error_code TEXT NULL,
  token TEXT NOT NULL,
  excerpt TEXT NULL,
  confidence TEXT NOT NULL,
  first_seen_at TIMESTAMPTZ NOT NULL,
  last_seen_at TIMESTAMPTZ NOT NULL,
  hit_count INT NOT NULL DEFAULT 1
);

CREATE INDEX ON failure_signatures (scope_type, scope_id, toolchain_hash);
CREATE INDEX ON failure_signatures (token);

8. MOTION & ANIMATION TOKENS

8.1 Duration Tokens

Token Value Use Case
duration-xs 80ms Quick hover, focus
duration-sm 140ms Button press, small transitions
duration-md 200ms Modal open/close, panel slide
duration-lg 260ms Page transitions
duration-xl 320ms Complex animations

Reduced-motion override: Clamp all to 0-80ms

8.2 Easing Tokens

  • standard: Default transition
  • decel: Element entering (start fast, slow down)
  • accel: Element exiting (slow start, speed up)
  • emphasized: Important state changes

8.3 Distance Scales

  • XS: 4px
  • SM: 8px
  • MD: 16px
  • LG: 24px
  • XL: 32px

Location: src/Web/StellaOps.Web/src/styles/tokens/motion.{ts,scss}

9. ACCESSIBILITY REQUIREMENTS

9.1 WCAG 2.1 AA Compliance

  • Focus order: logical and consistent
  • Keyboard: all interactive elements accessible
  • Contrast:
    • Text: ≥ 4.5:1
    • UI elements: ≥ 3:1
  • Reduced motion: honored via prefers-reduced-motion
  • Status messaging: aria-live=polite for updates

9.2 Reduced-Motion Rules

When prefers-reduced-motion: reduce:

  • Durations clamp to 0-80ms
  • Disable parallax/auto-animations
  • Focus/hover states remain visible
  • No animated GIF/Lottie autoplay

9.3 Screen Reader Support

  • Undo window: 8s with keyboard focus and aria-live=polite
  • Loading states: Announce state changes
  • Error messages: Informative, not generic

10. EVIDENCE & PROOF SPECIFICATIONS

10.1 Evidence Bundle Minimum Requirements

Component presence:

  • SBOM fragment (SPDX/CycloneDX) with component identity and provenance
  • Signed attestation for SBOM artifact

Vulnerability match:

  • Matching rule details (CPE/purl/range) + scanner identity/version
  • Signed vulnerability report attestation

Reachable vulnerability:

  • Call path: entrypoint → frames → vulnerable symbol
  • Hash/digest of call graph slice (tamper-evident)
  • Tool info + limitations (reflection/dynamic dispatch uncertainty)

Not affected via VEX:

  • VEX statement (OpenVEX/CSAF) + signer
  • Justification for not_affected
  • Align to CISA minimum requirements

Gate decision:

  • Input digests (SBOM digest, scan attestation digests, VEX doc digests)
  • Policy version + rule ID
  • Deterministic decision hash over (policy + input digests)

10.2 Evidence Object Structure

interface Evidence {
  sbom_snippet_attestation: DSSEEnvelope,
  reachability_proof: {
    entrypoint: string,
    frames: CallFrame[],
    file_hashes: string[],
    graph_digest: string
  },
  attestation_chain: DSSESummary[],
  transparency_receipt: {
    logIndex: number,
    uuid: string,
    inclusionProof: string,
    checkpoint: string
  }
}

10.3 Proof Panel Requirements

Four artifacts:

  1. SBOM snippet (signed): DSSE attestation, verify with cosign
  2. Call-stack slice: Entrypoint → vulnerable symbol, status pill (Reachable, Potentially reachable, Unreachable)
  3. Attestation chain: DSSE envelope summary, verification status, "Verify locally" command
  4. Transparency receipt: Rekor inclusion proof, "Verify inclusion" command

One-click export:

  • "Export Evidence (.tar.gz)" bundling: SBOM slice, call-stack JSON, DSSE attestation, Rekor proof JSON

11. CONFIGURATION & FEATURE FLAGS

11.1 TTFS Feature Flags

ttfs:
  first_signal_enabled: true      # Default ON in staging
  cache_enabled: true
  failure_index_enabled: true
  sse_enabled: true
  policy_preeval_enabled: true

11.2 Cache Configuration

cache:
  backend: valkey | postgres | none  # TTFS_CACHE_BACKEND
  ttl_seconds: 86400                  # TTFS_CACHE_TTL_SECONDS
  key_pattern: "signal:job:{jobId}"

11.3 Air-Gapped Profile

  • Skip Valkey; use Postgres-only
  • Use first_signal_snapshots table
  • NOTIFY/LISTEN for streaming updates

12. TESTING REQUIREMENTS

12.1 Acceptance Criteria (TTE)

  • First paint shows real proof snippet (not summary)
  • "Copy proof" button works within 1 click
  • TTE P95 in staging ≤ 10s; in prod ≤ 15s
  • If proof missing, explicit empty-state + retry path
  • Telemetry sampled ≥ 50% of sessions (or 100% for internal)

12.2 Acceptance Tests (TTFS)

  • Run with early fail → first signal < 1s, shows exact command + exit code
  • Run with policy gate fail → rule name + fix hint visible first
  • Offline/slow network → cached summary still renders actionable hint

12.3 Determinism Requirements

  • Freeze timers to 2025-12-04T12:00:00Z in stories/e2e
  • Seed RNG with 0x5EED2025 unless scenario-specific
  • All fixtures stored under tests/fixtures/micro/
  • No network calls; offline assets bundled
  • Playwright runs with --disable-animations and reduced-motion emulation

12.4 Load Tests

/jobs/{id}/signal:

  • Cache-hit P95 ≤ 250ms
  • Cold path P95 ≤ 500ms
  • Error rate < 0.1% under expected concurrency

13. REDACTION & SECURITY

13.1 Excerpt Redaction Rules

  • Strip: bearer tokens, API keys, access tokens, private URLs
  • Cap excerpt length: 240 chars
  • Normalize whitespace
  • Never include excerpts in telemetry attributes

13.2 Tenant Isolation

Cache keys include tenant boundary:

tenant:{tenantId}:signal:job:{jobId}

Failure signatures looked up within same tenant only.

13.3 Secret Scanning

Runtime guardrails:

  • If excerpt contains forbidden patterns → replace with "[redacted]"
  • Security review sign-off required for snapshot + signature + telemetry

14. LOCALIZATION

14.1 Micro-Copy Requirements

  • Keys and ICU messages for micro-interaction copy
  • Defaults: EN
  • Fallbacks present
  • No hard-coded strings in components
  • i18n extraction shows zero TODO keys

14.2 Snapshot Verification

Verify translated skeleton/error/undo copy in snapshots.

15. DELIVERABLES MAP

Category Location Description
Motion tokens src/Web/StellaOps.Web/src/styles/tokens/motion.{ts,scss} Duration, easing, distance scales + reduced-motion overrides
Storybook stories apps/storybook/src/stories/micro/* Slow, error, offline, reduced-motion, undo flows
Playwright suite tests/e2e/micro-interactions.spec.ts MI2/MI3/MI4/MI8 coverage
Telemetry schema docs/modules/ui/telemetry/ui-micro.schema.json Event schema + validators
Component map docs/modules/ui/micro-interactions-map.md Components → interaction type → token usage
Fixtures tests/fixtures/micro/ Deterministic test fixtures

Document Version: 1.0 Target Platform: .NET 10, PostgreSQL ≥16, Angular v17