stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 3411e825cd themesd advisories enhanced

2025-12-14 21:29:44 +02:00

19 KiB

Raw Blame History

UX and Time-to-Evidence Technical Reference

Source Advisories:

01-Dec-2025 - Tracking UX Health with Time‑to‑Evidence
12-Dec-2025 - Measure UX Efficiency Through TTFS
13-Dec-2025 - Define a north star metric for TTFS
14-Dec-2025 - Add a dedicated "first_signal" event
04-Dec-2025 - Designing Traceable Evidence in Security UX
05-Dec-2025 - Designing Triage UX That Stays Quiet on Purpose
30-Nov-2025 - UI Micro-Interactions for StellaOps
11-Dec-2025 - Stella DevOps UX Implementation Guide

Last Updated: 2025-12-14

1. PERFORMANCE TARGETS & SLOS

1.1 Time-to-Evidence (TTE)

Definition: TTE = t_first_proof_rendered − t_open_finding

Primary SLO: P95 ≤ 15s (stretch: P99 ≤ 30s)

Guardrail: P50 < 3s

By proof type:

Simple proof (SBOM row): P95 ≤ 5s
Complex proof (reachability graph): P95 ≤ 15s

Backend budget: 12s backend + 3s UI/render margin = 15s P95

Query performance: O(log n) on indexed columns

1.2 Time-to-First-Signal (TTFS)

Definition: Time from user action/CI start → first meaningful signal rendered/logged

Primary SLO: P50 < 2s, P95 < 5s (all surfaces: UI, CLI, CI)

Warm path: P50 < 700ms, P95 < 2500ms

Cold path: P95 ≤ 4000ms

Component budgets:

Frontend: ≤150ms (skeleton + last known state)
Edge/API: ≤250ms (signal frame fast path from cache)
Core services: ≤500–1500ms (pre-indexed failures, warm summaries)
Slow work: async (scan, lattice merge, provenance)

1.3 General UX Performance

Interaction response: ≤100ms
Animation frame budget: 16ms avg / 50ms P95
LCP placeholder: shown immediately
Layout shift: <0.05
Motion durations: 80/140/200/260/320ms
Reduced-motion: 0-80ms clamp

1.4 Cache Performance

Cache-hit response: P95 ≤ 250ms
Cold response: P95 ≤ 500ms
Endpoint error rate: < 0.1% under expected concurrency

2. METRICS DEFINITIONS & FORMULAS

2.1 TTE Metrics

// Core TTE calculation
tte_ms = proof_rendered.timestamp - finding_open.timestamp

// Dimensions
{
  tenant: string,
  finding_id: string,
  proof_kind: 'sbom' | 'reachability' | 'vex',
  source: 'local' | 'remote' | 'cache',
  page: string
}

SQL Rollup (hourly):

SELECT
  proof_kind,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
FROM tte_events
WHERE ts >= now() - interval '1 hour'
GROUP BY proof_kind;

2.2 TTFS Metrics

// Core TTFS calculation
ttfs_ms = signal_rendered.timestamp - start.timestamp

// Dimensions
{
  surface: 'ui' | 'cli' | 'ci',
  cache_hit: boolean,
  signal_source: 'snapshot' | 'cold_start' | 'failure_index',
  kind: string,
  repo_size_bucket: string,
  provider: string,
  branch: string,
  run_type: 'PR' | 'main',
  network_state: string
}

2.3 Secondary Metrics

Open→Action time: Time from opening run to first user action
Bounce rate: Close page within 10s without interaction
MTTR proxy: Time from failure to first rerun or fix commit
Signal availability rate: % of run views showing first signal within 3s
Signal accuracy score: Engineer confirms "helpful vs not" (sampled)
Extractor failure rate: Parsing errors / missing mappings / timeouts

2.4 DORA Metrics

Deployment Frequency: Deploys per day/week
Lead Time for Changes: Commit → deployment completion
Change Failure Rate: Failed deployments / total deployments
Time to Restore: Incident start → resolution

2.5 Quality Metrics

Error budget burn: Minutes over target per day
Top regressions: Last 7 days vs prior 7
Extraction failure rate: < 1% for sampled runs

3. EVENT SCHEMAS

3.1 TTE Events

finding_open:

{
  event: 'finding_open',
  findingId: string,
  tenantId: string,
  userId: string,
  userRole: 'admin' | 'dev' | 'triager',
  entryPoint: 'list' | 'search' | 'notification' | 'deep_link',
  uiVersion: string,
  buildSha: string,
  t: number // performance.now()
}

proof_rendered:

{
  event: 'proof_rendered',
  findingId: string,
  proofKind: 'sbom' | 'reachability' | 'vex' | 'logs' | 'other',
  source: 'local_cache' | 'backend_api' | '3rd_party',
  proofHeight: number, // pixel offset from top
  t: number // performance.now()
}

3.2 TTFS Events

ttfs_start:

{
  event: 'ttfs_start',
  runId: string,
  surface: 'ui' | 'cli' | 'ci',
  provider: string,
  repo: string,
  branch: string,
  runType: 'PR' | 'main',
  device: string,
  release: string,
  networkState: string,
  t: number
}

ttfs_signal_rendered:

{
  event: 'ttfs_signal_rendered',
  runId: string,
  surface: 'ui' | 'cli' | 'ci',
  cacheHit: boolean,
  signalSource: 'snapshot' | 'cold_start' | 'failure_index',
  kind: string,
  t: number
}

3.3 FirstSignal Event Contract

interface FirstSignal {
  version: '1.0',
  signalId: string,
  jobId: string,
  timestamp: string, // ISO-8601
  kind: 'queued' | 'started' | 'phase' | 'blocked' | 'failed' | 'succeeded' | 'canceled' | 'unavailable',
  phase: 'resolve' | 'fetch' | 'restore' | 'analyze' | 'policy' | 'report' | 'unknown',
  scope: {
    type: 'repo' | 'image' | 'artifact',
    id: string
  },
  summary: string,
  etaSeconds?: number,
  lastKnownOutcome?: {
    signatureId: string,
    errorCode: string,
    token: string,
    excerpt: string,
    confidence: 'low' | 'medium' | 'high',
    firstSeenAt: string,
    hitCount: number
  },
  nextActions?: Array<{
    type: 'open_logs' | 'open_job' | 'docs' | 'retry' | 'cli_command',
    label: string,
    target: string
  }>,
  diagnostics: {
    cacheHit: boolean,
    source: 'snapshot' | 'failure_index' | 'cold_start',
    correlationId: string
  }
}

3.4 UI Telemetry Schema

ui.micro. events*:

{
  version: string,
  tenant: string,
  surface: string,
  component: string,
  action: string,
  latency_ms: number,
  outcome: string,
  reduced_motion: boolean,
  offline_mode: boolean,
  error_code?: string
}

Schema location: docs/modules/ui/telemetry/ui-micro.schema.json

4. API CONTRACTS

4.1 First Signal Endpoint

GET /api/runs/{runId}/first-signal

Headers:

If-None-Match: W/"..." (supported)

Response:

{
  "runId": "123",
  "firstSignal": {
    "type": "stage_failed",
    "stage": "build",
    "step": "dotnet restore",
    "message": "401 Unauthorized: token expired",
    "at": "2025-12-11T09:22:31Z",
    "artifact": {
      "kind": "log",
      "range": { "start": 1880, "end": 1896 }
    }
  },
  "summaryEtag": "W/\"a1b2c3\""
}

Status codes:

200: Full first signal object
304: Not modified
404: Run not found
204: Run exists but signal not available yet

Response headers:

ETag
Cache-Control
X-Correlation-Id
Cache-Status: hit|miss|bypass

4.2 Summary Endpoint

GET /api/runs/{runId}/summary

Returns: Status, first failing stage/job, timestamps, blocking policies, artifact counts

4.3 SSE Events Endpoint

GET /api/runs/{runId}/events (Server-Sent Events)

Event payloads:

status (kind+phase+message)
hint (token+errorCode+confidence)
policy (blocked + policyId)
complete (terminal)

5. FRONTEND PATTERNS & COMPONENT SPECIFICATIONS

5.1 UI Contract (Evidence First)

Above the fold:

Always show compact Proof panel first (not hidden behind tabs)
Skeletons over spinners: Reserve space; render partial proof as ready
Plain text copy affordance: "Copy SBOM line / path" button next to proof
Defer non-proof widgets: CVSS badges, remediation prose, charts load after proof
Empty-state truth: "No proof available yet" + loader for that proof type only

5.2 Progressive Rendering Pattern

Immediate render:

Title, status badge, pipeline metadata (run id, commit, branch)
Skeleton for details area

First signal fetch: 3. Render FirstSignalCard immediately when available 4. Fire telemetry event when card is in DOM and visible

Lazy-load: 5. Stage graph 6. Full logs viewer 7. Artifacts list 8. Security findings 9. Trends, flaky tests, etc.

5.3 Component Specifications

FirstSignalCard Component

Standalone, minimal dependencies
Shows: summary + at least one next action button (Open job/logs)
Updates in-place on deltas from SSE
Falls back to polling when SSE fails

EvidencePanel Component

interface EvidencePanel {
  tabs: ['SBOM', 'Reachability', 'VEX', 'Logs', 'Other'],
  firstProofType: ProofKind,
  copyEnabled: boolean,
  emptyStateMessage?: string
}

ProofSpine Component

Displays: graphRevisionId, bundle hashes (SBOM/VEX/proof), receipt digest, and Rekor details (when present)
Copy affordances: copy graphRevisionId, proofBundleId, and receipt digest in one click
Verification status: Verified | Unverified | Failed verification | Expired/Outdated
"Verify locally" copy button with exact commands

5.4 Prefetch Strategy

From runs list view:

Use IntersectionObserver to prefetch summaries/first signals for items in viewport
Store results in in-memory cache (Map<runId, FirstSignal>)
Respect ETag to avoid redundant payloads

6. TELEMETRY REQUIREMENTS

6.1 Client-Side Telemetry

Frontend events:

// On route enter
metrics.emit('finding_open', { findingId, t: performance.now() });

// When first proof node/line hits DOM
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });

Sampling:

Staging: 100%
Production: ≥25% of sessions (ideal: 100%)

Clock handling:

Use performance.now() for TTE (monotonic within tab)
Don't mix backend clocks into TTE calculation

6.2 Backend Telemetry

Endpoint metrics:

signal_endpoint_latency_ms
signal_payload_bytes
signal_error_rate

Server-side timing logs (debug-level):

Cache read time
DB read time
Cold path time

Tracing:

Correlation ID propagated in:
- API response header
- Worker logs
- Events

6.3 Dashboard Requirements

Core widgets:

TTE distribution (P50/P90/P95/P99) per day, split by proof_kind
TTE by page/surface (list→detail, deep links, bookmarks)
TTE by user segment (new vs power users, roles)
Error budget: "Minutes over SLO per day"
Correlation: TTE vs session length, TTE vs "clicked ignore/snooze"

Operational panels:

Update granularity: Real-time or ≤15 min
Retention: ≥90 days
Breakdowns: backend_region, build_version

TTFS dashboards:

By surface (ui/cli/ci)
Cache hit rate
Endpoint latency percentiles
Repo size bucket
Kind/phase

Alerts:

Page when p95(ttfs_ms) > 5000 for 5 mins
Page when signal_endpoint_error_rate > 1%
Alert when P95 TTE > 15s for 15 minutes

7. DATABASE SCHEMAS

7.1 TTE Events Table

CREATE TABLE tte_events (
  id SERIAL PRIMARY KEY,
  ts TIMESTAMPTZ NOT NULL DEFAULT now(),
  tenant TEXT NOT NULL,
  finding_id TEXT NOT NULL,
  proof_kind TEXT NOT NULL,
  source TEXT NOT NULL,
  tte_ms INT NOT NULL,
  page TEXT,
  user_role TEXT
);

CREATE INDEX ON tte_events (ts DESC);
CREATE INDEX ON tte_events (proof_kind, ts DESC);

7.2 First Signal Snapshots

CREATE TABLE first_signal_snapshots (
  job_id TEXT PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  kind TEXT NOT NULL,
  phase TEXT NOT NULL,
  summary TEXT NOT NULL,
  eta_seconds INT NULL,
  payload_json JSONB NOT NULL
);

CREATE INDEX ON first_signal_snapshots (updated_at DESC);

7.3 Failure Signatures

CREATE TABLE failure_signatures (
  signature_id TEXT PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  scope_type TEXT NOT NULL,
  scope_id TEXT NOT NULL,
  toolchain_hash TEXT NOT NULL,
  error_code TEXT NULL,
  token TEXT NOT NULL,
  excerpt TEXT NULL,
  confidence TEXT NOT NULL,
  first_seen_at TIMESTAMPTZ NOT NULL,
  last_seen_at TIMESTAMPTZ NOT NULL,
  hit_count INT NOT NULL DEFAULT 1
);

CREATE INDEX ON failure_signatures (scope_type, scope_id, toolchain_hash);
CREATE INDEX ON failure_signatures (token);

8. MOTION & ANIMATION TOKENS

8.1 Duration Tokens

Token	Value	Use Case
`duration-xs`	80ms	Quick hover, focus
`duration-sm`	140ms	Button press, small transitions
`duration-md`	200ms	Modal open/close, panel slide
`duration-lg`	260ms	Page transitions
`duration-xl`	320ms	Complex animations

Reduced-motion override: Clamp all to 0-80ms

8.2 Easing Tokens

standard: Default transition
decel: Element entering (start fast, slow down)
accel: Element exiting (slow start, speed up)
emphasized: Important state changes

8.3 Distance Scales

XS: 4px
SM: 8px
MD: 16px
LG: 24px
XL: 32px

Location: src/Web/StellaOps.Web/src/styles/tokens/motion.{ts,scss}

9. ACCESSIBILITY REQUIREMENTS

9.1 WCAG 2.1 AA Compliance

Focus order: logical and consistent
Keyboard: all interactive elements accessible
Contrast:
- Text: ≥ 4.5:1
- UI elements: ≥ 3:1
Reduced motion: honored via prefers-reduced-motion
Status messaging: aria-live=polite for updates

9.2 Reduced-Motion Rules

When prefers-reduced-motion: reduce:

Durations clamp to 0-80ms
Disable parallax/auto-animations
Focus/hover states remain visible
No animated GIF/Lottie autoplay

Undo window: 8s with keyboard focus and aria-live=polite
Loading states: Announce state changes
Error messages: Informative, not generic

10. EVIDENCE & PROOF SPECIFICATIONS

10.1 Evidence Bundle Minimum Requirements

Component presence:

SBOM fragment (SPDX/CycloneDX) with component identity and provenance
Signed attestation for SBOM artifact

Vulnerability match:

Matching rule details (CPE/purl/range) + scanner identity/version
Signed vulnerability report attestation

Reachable vulnerability:

Call path: entrypoint → frames → vulnerable symbol
Hash/digest of call graph slice (tamper-evident)
Tool info + limitations (reflection/dynamic dispatch uncertainty)

Not affected via VEX:

VEX statement (OpenVEX/CSAF) + signer
Justification for not_affected
Align to CISA minimum requirements

Gate decision:

Input digests (SBOM digest, scan attestation digests, VEX doc digests)
Policy version + rule ID
Deterministic decision hash over (policy + input digests)

10.2 Evidence Object Structure

interface Evidence {
  sbom_snippet_attestation: DSSEEnvelope,
  reachability_proof: {
    entrypoint: string,
    frames: CallFrame[],
    file_hashes: string[],
    graph_digest: string
  },
  attestation_chain: DSSESummary[],
  transparency_receipt: {
    logIndex: number,
    uuid: string,
    inclusionProof: string,
    checkpoint: string
  }
}

10.3 Proof Panel Requirements

Four artifacts:

SBOM snippet (signed): DSSE attestation, verify with cosign
Call-stack slice: Entrypoint → vulnerable symbol, status pill (Reachable, Potentially reachable, Unreachable)
Attestation chain: DSSE envelope summary, verification status, "Verify locally" command
Transparency receipt: Rekor inclusion proof, "Verify inclusion" command

One-click export:

"Export Evidence (.tar.gz)" bundling: SBOM slice, call-stack JSON, DSSE attestation, Rekor proof JSON

11. CONFIGURATION & FEATURE FLAGS

11.1 TTFS Feature Flags

ttfs:
  first_signal_enabled: true      # Default ON in staging
  cache_enabled: true
  failure_index_enabled: true
  sse_enabled: true
  policy_preeval_enabled: true

11.2 Cache Configuration

cache:
  backend: valkey | postgres | none  # TTFS_CACHE_BACKEND
  ttl_seconds: 86400                  # TTFS_CACHE_TTL_SECONDS
  key_pattern: "signal:job:{jobId}"

11.3 Air-Gapped Profile

Skip Valkey; use Postgres-only
Use first_signal_snapshots table
NOTIFY/LISTEN for streaming updates

12. TESTING REQUIREMENTS

12.1 Acceptance Criteria (TTE)

First paint shows real proof snippet (not summary)
"Copy proof" button works within 1 click
TTE P95 in staging ≤ 10s; in prod ≤ 15s
If proof missing, explicit empty-state + retry path
Telemetry sampled ≥ 50% of sessions (or 100% for internal)

12.2 Acceptance Tests (TTFS)

Run with early fail → first signal < 1s, shows exact command + exit code
Run with policy gate fail → rule name + fix hint visible first
Offline/slow network → cached summary still renders actionable hint

12.3 Determinism Requirements

Freeze timers to 2025-12-04T12:00:00Z in stories/e2e
Seed RNG with 0x5EED2025 unless scenario-specific
All fixtures stored under tests/fixtures/micro/
No network calls; offline assets bundled
Playwright runs with --disable-animations and reduced-motion emulation

12.4 Load Tests

/jobs/{id}/signal:

Cache-hit P95 ≤ 250ms
Cold path P95 ≤ 500ms
Error rate < 0.1% under expected concurrency

13. REDACTION & SECURITY

13.1 Excerpt Redaction Rules

Strip: bearer tokens, API keys, access tokens, private URLs
Cap excerpt length: 240 chars
Normalize whitespace
Never include excerpts in telemetry attributes

13.2 Tenant Isolation

Cache keys include tenant boundary:

tenant:{tenantId}:signal:job:{jobId}

Failure signatures looked up within same tenant only.

13.3 Secret Scanning

Runtime guardrails:

If excerpt contains forbidden patterns → replace with "[redacted]"
Security review sign-off required for snapshot + signature + telemetry

14. LOCALIZATION

14.1 Micro-Copy Requirements

Keys and ICU messages for micro-interaction copy
Defaults: EN
Fallbacks present
No hard-coded strings in components
i18n extraction shows zero TODO keys

14.2 Snapshot Verification

Verify translated skeleton/error/undo copy in snapshots.

15. DELIVERABLES MAP

Category	Location	Description
Motion tokens	`src/Web/StellaOps.Web/src/styles/tokens/motion.{ts,scss}`	Duration, easing, distance scales + reduced-motion overrides
Storybook stories	`apps/storybook/src/stories/micro/*`	Slow, error, offline, reduced-motion, undo flows
Playwright suite	`tests/e2e/micro-interactions.spec.ts`	MI2/MI3/MI4/MI8 coverage
Telemetry schema	`docs/modules/ui/telemetry/ui-micro.schema.json`	Event schema + validators
Component map	`docs/modules/ui/micro-interactions-map.md`	Components → interaction type → token usage
Fixtures	`tests/fixtures/micro/`	Deterministic test fixtures

Document Version: 1.0 Target Platform: .NET 10, PostgreSQL ≥16, Angular v17

19 KiB Raw Blame History Unescape Escape