Add comprehensive security tests for OWASP A02, A05, A07, and A08 categories
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled

- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management.
- Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management.
- Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support.
- Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
This commit is contained in:
master
2025-12-16 16:40:19 +02:00
parent 415eff1207
commit 2170a58734
206 changed files with 30547 additions and 534 deletions

View File

@@ -0,0 +1,188 @@
# Evidence Reconciliation
This document describes the evidence reconciliation algorithm implemented in the `StellaOps.AirGap.Importer` module. The algorithm provides deterministic, lattice-based reconciliation of security evidence from air-gapped bundles.
## Overview
Evidence reconciliation is a 5-step pipeline that transforms raw evidence artifacts (SBOMs, attestations, VEX documents) into a unified, content-addressed evidence graph suitable for policy evaluation and audit trails.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Evidence Reconciliation Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Step 1: Artifact Indexing │
│ ├── EvidenceDirectoryDiscovery │
│ ├── ArtifactIndex (digest-keyed) │
│ └── Digest normalization (sha256:...) │
│ │
│ Step 2: Evidence Collection │
│ ├── SbomCollector (CycloneDX, SPDX) │
│ ├── AttestationCollector (DSSE) │
│ └── Integration with DsseVerifier │
│ │
│ Step 3: Normalization │
│ ├── JsonNormalizer (stable sorting) │
│ ├── Timestamp stripping │
│ └── URI lowercase normalization │
│ │
│ Step 4: Lattice Rules │
│ ├── SourcePrecedenceLattice │
│ ├── VEX merge with precedence │
│ └── Conflict resolution │
│ │
│ Step 5: Graph Emission │
│ ├── EvidenceGraph construction │
│ ├── Deterministic serialization │
│ └── SHA-256 manifest generation │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Components
### Step 1: Artifact Indexing
**`ArtifactIndex`** - A digest-keyed index of all artifacts in the evidence bundle.
```csharp
// Key types
public readonly record struct DigestKey(string Algorithm, string Value);
// Normalization
DigestKey.Parse("sha256:abc123...") DigestKey("sha256", "abc123...")
```
**`EvidenceDirectoryDiscovery`** - Discovers evidence files from a directory structure.
Expected structure:
```
evidence/
├── sboms/
│ ├── component-a.cdx.json
│ └── component-b.spdx.json
├── attestations/
│ └── artifact.dsse.json
└── vex/
└── vendor-vex.json
```
### Step 2: Evidence Collection
**Parsers:**
- `CycloneDxParser` - Parses CycloneDX 1.4/1.5/1.6 format
- `SpdxParser` - Parses SPDX 2.3 format
- `DsseAttestationParser` - Parses DSSE envelopes
**Collectors:**
- `SbomCollector` - Orchestrates SBOM parsing and indexing
- `AttestationCollector` - Orchestrates attestation parsing and verification
### Step 3: Normalization
**`SbomNormalizer`** applies format-specific normalization:
| Rule | Description |
|------|-------------|
| Stable JSON sorting | Keys sorted alphabetically (ordinal) |
| Timestamp stripping | Removes `created`, `modified`, `timestamp` fields |
| URI normalization | Lowercases scheme, host, normalizes paths |
| Whitespace normalization | Consistent formatting |
### Step 4: Lattice Rules
**`SourcePrecedenceLattice`** implements a bounded lattice for VEX source authority:
```
Vendor (top)
Maintainer
ThirdParty
Unknown (bottom)
```
**Lattice Properties (verified by property-based tests):**
- **Commutativity**: `Join(a, b) = Join(b, a)`
- **Associativity**: `Join(Join(a, b), c) = Join(a, Join(b, c))`
- **Idempotence**: `Join(a, a) = a`
- **Absorption**: `Join(a, Meet(a, b)) = a`
**Conflict Resolution Order:**
1. Higher precedence source wins
2. More recent timestamp wins (when same precedence)
3. Status priority: NotAffected > Fixed > UnderInvestigation > Affected > Unknown
### Step 5: Graph Emission
**`EvidenceGraph`** - A content-addressed graph of reconciled evidence:
```csharp
public sealed record EvidenceGraph
{
public required string Version { get; init; }
public required string DigestAlgorithm { get; init; }
public required string RootDigest { get; init; }
public required IReadOnlyList<EvidenceNode> Nodes { get; init; }
public required IReadOnlyList<EvidenceEdge> Edges { get; init; }
public required DateTimeOffset GeneratedAt { get; init; }
}
```
**Determinism guarantees:**
- Nodes sorted by digest (ordinal)
- Edges sorted by (source, target, type)
- SHA-256 manifest includes content hash
- Reproducible across runs with same inputs
## Integration
### CLI Usage
```bash
# Verify offline evidence bundle
stellaops verify offline \
--evidence-dir /evidence \
--artifact sha256:def456... \
--policy verify-policy.yaml
```
### API
```csharp
// Reconcile evidence
var reconciler = new EvidenceReconciler(options);
var graph = await reconciler.ReconcileAsync(evidenceDir, cancellationToken);
// Verify determinism
var hash1 = graph.ComputeHash();
var graph2 = await reconciler.ReconcileAsync(evidenceDir, cancellationToken);
var hash2 = graph2.ComputeHash();
Debug.Assert(hash1 == hash2); // Always true
```
## Testing
### Golden-File Tests
Test fixtures in `tests/AirGap/StellaOps.AirGap.Importer.Tests/Reconciliation/Fixtures/`:
- `cyclonedx-sample.json` - CycloneDX 1.5 sample
- `spdx-sample.json` - SPDX 2.3 sample
- `dsse-attestation-sample.json` - DSSE envelope sample
### Property-Based Tests
`SourcePrecedenceLatticePropertyTests` verifies:
- Lattice algebraic properties (commutativity, associativity, idempotence, absorption)
- Ordering properties (antisymmetry, transitivity, reflexivity)
- Bound properties (join is LUB, meet is GLB)
- Merge determinism
## Related Documents
- [Air-Gap Module Architecture](./architecture.md) *(pending)*
- [DSSE Verification](../../adr/dsse-verification.md) *(if exists)*
- [Offline Kit Import Flow](./exporter-cli-coordination.md)

View File

@@ -45,23 +45,23 @@ Trust boundary: **Only the Signer** is allowed to call submission endpoints; enf
- `StellaOps.BuildProvenance@1`
- `StellaOps.SBOMAttestation@1`
- `StellaOps.ScanResults@1`
- `StellaOps.PolicyEvaluation@1`
- `StellaOps.VEXAttestation@1`
- `StellaOps.RiskProfileEvidence@1`
Each predicate embeds subject digests, issuer metadata, policy context, materials, and optional transparency hints. Unsupported predicates return `422 predicate_unsupported`.
> **Golden fixtures:** Deterministic JSON statements for each predicate live in `src/Attestor/StellaOps.Attestor.Types/samples`. They are kept stable by the `StellaOps.Attestor.Types.Tests` project so downstream docs and contracts can rely on them without drifting.
- `StellaOps.PolicyEvaluation@1`
- `StellaOps.VEXAttestation@1`
- `StellaOps.RiskProfileEvidence@1`
### Envelope & signature model
- DSSE envelopes canonicalised (stable JSON ordering) prior to hashing.
- Signature modes: keyless (Fulcio cert chain), keyful (KMS/HSM), hardware (FIDO2/WebAuthn). Multiple signatures allowed.
- Rekor entry stores bundle hash, certificate chain, and optional witness endorsements.
- Archive CAS retains original envelope plus metadata for offline verification.
- Envelope serializer emits **compact** (canonical, minified) and **expanded** (annotated, indented) JSON variants off the same canonical byte stream so hashing stays deterministic while humans get context.
- Payload handling supports **optional compression** (`gzip`, `brotli`) with compression metadata recorded in the expanded view and digesting always performed over the uncompressed bytes.
- Expanded envelopes surface **detached payload references** (URI, digest, media type, size) so large artifacts can live in CAS/object storage while the canonical payload remains embedded for verification.
- Payload previews auto-render JSON or UTF-8 text in the expanded output to simplify triage in air-gapped and offline review flows.
Each predicate embeds subject digests, issuer metadata, policy context, materials, and optional transparency hints. Unsupported predicates return `422 predicate_unsupported`.
> **Golden fixtures:** Deterministic JSON statements for each predicate live in `src/Attestor/StellaOps.Attestor.Types/samples`. They are kept stable by the `StellaOps.Attestor.Types.Tests` project so downstream docs and contracts can rely on them without drifting.
### Envelope & signature model
- DSSE envelopes canonicalised (stable JSON ordering) prior to hashing.
- Signature modes: keyless (Fulcio cert chain), keyful (KMS/HSM), hardware (FIDO2/WebAuthn). Multiple signatures allowed.
- Rekor entry stores bundle hash, certificate chain, and optional witness endorsements.
- Archive CAS retains original envelope plus metadata for offline verification.
- Envelope serializer emits **compact** (canonical, minified) and **expanded** (annotated, indented) JSON variants off the same canonical byte stream so hashing stays deterministic while humans get context.
- Payload handling supports **optional compression** (`gzip`, `brotli`) with compression metadata recorded in the expanded view and digesting always performed over the uncompressed bytes.
- Expanded envelopes surface **detached payload references** (URI, digest, media type, size) so large artifacts can live in CAS/object storage while the canonical payload remains embedded for verification.
- Payload previews auto-render JSON or UTF-8 text in the expanded output to simplify triage in air-gapped and offline review flows.
### Verification pipeline overview
1. Fetch envelope (from request, cache, or storage) and validate DSSE structure.
@@ -70,6 +70,33 @@ Each predicate embeds subject digests, issuer metadata, policy context, material
4. Validate Merkle proof against checkpoint; optionally verify witness endorsement.
5. Return cached verification bundle including policy verdict and timestamps.
### Rekor Inclusion Proof Verification (SPRINT_3000_0001_0001)
The Attestor implements RFC 6962-compliant Merkle inclusion proof verification for Rekor transparency log entries:
**Components:**
- `MerkleProofVerifier` — Verifies Merkle audit paths per RFC 6962 Section 2.1.1
- `CheckpointSignatureVerifier` — Parses and verifies Rekor checkpoint signatures (ECDSA/Ed25519)
- `RekorVerificationOptions` — Configuration for public keys, offline mode, and checkpoint caching
**Verification Flow:**
1. Parse checkpoint body (origin, tree size, root hash)
2. Verify checkpoint signature against Rekor public key
3. Compute leaf hash from canonicalized entry
4. Walk Merkle path from leaf to root using RFC 6962 interior node hashing
5. Compare computed root with checkpoint root hash (constant-time)
**Offline Mode:**
- Bundled checkpoints can be used in air-gapped environments
- `EnableOfflineMode` and `OfflineCheckpointBundlePath` configuration options
- `AllowOfflineWithoutSignature` for fully disconnected scenarios (reduced security)
**Metrics:**
- `attestor.rekor_inclusion_verify_total` — Verification attempts by result
- `attestor.rekor_checkpoint_verify_total` — Checkpoint signature verifications
- `attestor.rekor_offline_verify_total` — Offline mode verifications
- `attestor.rekor_checkpoint_cache_hits/misses` — Checkpoint cache performance
### UI & CLI touchpoints
- Console: Evidence browser, verification report, chain-of-custody graph, issuer/key management, attestation workbench, bulk verification views.
- CLI: `stella attest sign|verify|list|fetch|key` with offline verification and export bundle support.
@@ -127,6 +154,72 @@ Indexes:
---
## 2.1) Content-Addressed Identifier Formats
The ProofChain library (`StellaOps.Attestor.ProofChain`) defines canonical content-addressed identifiers for all proof chain components. These IDs ensure determinism, tamper-evidence, and reproducibility.
### Identifier Types
| ID Type | Format | Source | Example |
|---------|--------|--------|---------|
| **ArtifactID** | `sha256:<64-hex>` | Container manifest or binary hash | `sha256:a1b2c3d4e5f6...` |
| **SBOMEntryID** | `<sbomDigest>:<purl>[@<version>]` | SBOM hash + component PURL | `sha256:91f2ab3c:pkg:npm/lodash@4.17.21` |
| **EvidenceID** | `sha256:<hash>` | Canonical evidence JSON | `sha256:e7f8a9b0c1d2...` |
| **ReasoningID** | `sha256:<hash>` | Canonical reasoning JSON | `sha256:f0e1d2c3b4a5...` |
| **VEXVerdictID** | `sha256:<hash>` | Canonical VEX verdict JSON | `sha256:d4c5b6a7e8f9...` |
| **ProofBundleID** | `sha256:<merkle_root>` | Merkle root of bundle components | `sha256:1a2b3c4d5e6f...` |
| **GraphRevisionID** | `grv_sha256:<hash>` | Merkle root of graph state | `grv_sha256:9f8e7d6c5b4a...` |
### Canonicalization (RFC 8785)
All JSON-based IDs use RFC 8785 (JCS) canonicalization:
- UTF-8 encoding
- Lexicographically sorted keys
- No whitespace (minified)
- No volatile fields (timestamps, random values excluded)
**Implementation:** `StellaOps.Attestor.ProofChain.Json.Rfc8785JsonCanonicalizer`
### Merkle Tree Construction
ProofBundleID and GraphRevisionID use deterministic binary Merkle trees:
- SHA-256 hash function
- Lexicographically sorted leaf inputs
- Standard binary tree construction (pair-wise hashing)
- Odd leaves promoted to next level
**Implementation:** `StellaOps.Attestor.ProofChain.Merkle.DeterministicMerkleTreeBuilder`
### ID Generation Interface
```csharp
// Core interface for ID generation
public interface IContentAddressedIdGenerator
{
EvidenceId GenerateEvidenceId(EvidencePredicate predicate);
ReasoningId GenerateReasoningId(ReasoningPredicate predicate);
VexVerdictId GenerateVexVerdictId(VexPredicate predicate);
ProofBundleId GenerateProofBundleId(SbomEntryId sbom, EvidenceId[] evidence,
ReasoningId reasoning, VexVerdictId verdict);
GraphRevisionId GenerateGraphRevisionId(GraphState state);
}
```
### Predicate Types
The ProofChain library defines DSSE predicates for each attestation type:
| Predicate | Type URI | Purpose |
|-----------|----------|---------|
| `EvidencePredicate` | `stellaops.org/evidence/v1` | Scan evidence (findings, reachability) |
| `ReasoningPredicate` | `stellaops.org/reasoning/v1` | Exploitability reasoning |
| `VexPredicate` | `stellaops.org/vex-verdict/v1` | VEX status determination |
| `ProofSpinePredicate` | `stellaops.org/proof-spine/v1` | Complete proof bundle |
**Reference:** `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/`
---
## 3) Input contract (from Signer)
**Attestor accepts only** DSSE envelopes that satisfy all of:
@@ -157,53 +250,53 @@ Indexes:
## 4) APIs
### 4.1 Signing
`POST /api/v1/attestations:sign` *(mTLS + OpTok required)*
* **Purpose**: Deterministically wrap StellaOps payloads in DSSE envelopes before Rekor submission. Reuses the submission rate limiter and honours caller tenancy/audience scopes.
* **Body**:
```json
{
"keyId": "signing-key-id",
"payloadType": "application/vnd.in-toto+json",
"payload": "<base64 payload>",
"mode": "keyless|keyful|kms",
"certificateChain": ["-----BEGIN CERTIFICATE-----..."],
"artifact": {
"sha256": "<subject sha256>",
"kind": "sbom|report|vex-export",
"imageDigest": "sha256:...",
"subjectUri": "oci://..."
},
"logPreference": "primary|mirror|both",
"archive": true
}
```
* **Behaviour**:
* Resolve the signing key from `attestor.signing.keys[]` (includes algorithm, provider, and optional KMS version).
* Compute DSSE preauthentication encoding, sign with the resolved provider (default EC, BouncyCastle Ed25519, or FileKMS ES256), and add static + request certificate chains.
* Canonicalise the resulting bundle, derive `bundleSha256`, and mirror the request meta shape used by `/api/v1/rekor/entries`.
* Emit `attestor.sign_total{result,algorithm,provider}` and `attestor.sign_latency_seconds{algorithm,provider}` metrics and append an audit row (`action=sign`).
* **Response 200**:
```json
{
"bundle": { "dsse": { "payloadType": "...", "payload": "...", "signatures": [{ "keyid": "signing-key-id", "sig": "..." }] }, "certificateChain": ["..."], "mode": "kms" },
"meta": { "artifact": { "sha256": "...", "kind": "sbom" }, "bundleSha256": "...", "logPreference": "primary", "archive": true },
"key": { "keyId": "signing-key-id", "algorithm": "ES256", "mode": "kms", "provider": "kms", "signedAt": "2025-11-01T12:34:56Z" }
}
```
* **Errors**: `400 key_not_found`, `400 payload_missing|payload_invalid_base64|artifact_sha_missing`, `400 mode_not_allowed`, `403 client_certificate_required`, `401 invalid_token`, `500 signing_failed`.
### 4.2 Submission
`POST /api/v1/rekor/entries` *(mTLS + OpTok required)*
* **Body**: as above.
### 4.1 Signing
`POST /api/v1/attestations:sign` *(mTLS + OpTok required)*
* **Purpose**: Deterministically wrap StellaOps payloads in DSSE envelopes before Rekor submission. Reuses the submission rate limiter and honours caller tenancy/audience scopes.
* **Body**:
```json
{
"keyId": "signing-key-id",
"payloadType": "application/vnd.in-toto+json",
"payload": "<base64 payload>",
"mode": "keyless|keyful|kms",
"certificateChain": ["-----BEGIN CERTIFICATE-----..."],
"artifact": {
"sha256": "<subject sha256>",
"kind": "sbom|report|vex-export",
"imageDigest": "sha256:...",
"subjectUri": "oci://..."
},
"logPreference": "primary|mirror|both",
"archive": true
}
```
* **Behaviour**:
* Resolve the signing key from `attestor.signing.keys[]` (includes algorithm, provider, and optional KMS version).
* Compute DSSE preauthentication encoding, sign with the resolved provider (default EC, BouncyCastle Ed25519, or FileKMS ES256), and add static + request certificate chains.
* Canonicalise the resulting bundle, derive `bundleSha256`, and mirror the request meta shape used by `/api/v1/rekor/entries`.
* Emit `attestor.sign_total{result,algorithm,provider}` and `attestor.sign_latency_seconds{algorithm,provider}` metrics and append an audit row (`action=sign`).
* **Response 200**:
```json
{
"bundle": { "dsse": { "payloadType": "...", "payload": "...", "signatures": [{ "keyid": "signing-key-id", "sig": "..." }] }, "certificateChain": ["..."], "mode": "kms" },
"meta": { "artifact": { "sha256": "...", "kind": "sbom" }, "bundleSha256": "...", "logPreference": "primary", "archive": true },
"key": { "keyId": "signing-key-id", "algorithm": "ES256", "mode": "kms", "provider": "kms", "signedAt": "2025-11-01T12:34:56Z" }
}
```
* **Errors**: `400 key_not_found`, `400 payload_missing|payload_invalid_base64|artifact_sha_missing`, `400 mode_not_allowed`, `403 client_certificate_required`, `401 invalid_token`, `500 signing_failed`.
### 4.2 Submission
`POST /api/v1/rekor/entries` *(mTLS + OpTok required)*
* **Body**: as above.
* **Behavior**:
* Verify caller (mTLS + OpTok).
@@ -226,16 +319,16 @@ Indexes:
"status": "included"
}
```
* **Errors**: `401 invalid_token`, `403 not_signer|chain_untrusted`, `409 duplicate_bundle` (with existing `uuid`), `502 rekor_unavailable`, `504 proof_timeout`.
### 4.3 Proof retrieval
`GET /api/v1/rekor/entries/{uuid}`
* **Errors**: `401 invalid_token`, `403 not_signer|chain_untrusted`, `409 duplicate_bundle` (with existing `uuid`), `502 rekor_unavailable`, `504 proof_timeout`.
### 4.3 Proof retrieval
`GET /api/v1/rekor/entries/{uuid}`
* Returns `entries` row (refreshes proof from Rekor if stale/missing).
* Accepts `?refresh=true` to force backend query.
### 4.4 Verification (thirdparty or internal)
### 4.4 Verification (thirdparty or internal)
`POST /api/v1/rekor/verify`
@@ -250,28 +343,28 @@ Indexes:
1. **Bundle signature** → cert chain to Fulcio/KMS roots configured.
2. **Inclusion proof** → recompute leaf hash; verify Merkle path against checkpoint root.
3. Optionally verify **checkpoint** against local trust anchors (if Rekor signs checkpoints).
4. Confirm **subject.digest** matches callerprovided hash (when given).
5. Fetch **transparency witness** statement when enabled; cache results and downgrade status to WARN when endorsements are missing or mismatched.
4. Confirm **subject.digest** matches callerprovided hash (when given).
5. Fetch **transparency witness** statement when enabled; cache results and downgrade status to WARN when endorsements are missing or mismatched.
* **Response**:
```json
{ "ok": true, "uuid": "…", "index": 123, "logURL": "…", "checkedAt": "…" }
```
### 4.5 Bulk verification
`POST /api/v1/rekor/verify:bulk` enqueues a verification job containing up to `quotas.bulk.maxItemsPerJob` items. Each item mirrors the single verification payload (uuid | artifactSha256 | subject+envelopeId, optional policyVersion/refreshProof). The handler persists a MongoDB job document (`bulk_jobs` collection) and returns `202 Accepted` with a job descriptor and polling URL.
`GET /api/v1/rekor/verify:bulk/{jobId}` returns progress and per-item results (subject/uuid, status, issues, cached verification report if available). Jobs are tenant- and subject-scoped; only the initiating principal can read their progress.
**Worker path:** `BulkVerificationWorker` claims queued jobs (`status=queued → running`), executes items sequentially through the cached verification service, updates progress counters, and records metrics:
- `attestor.bulk_jobs_total{status}` completed/failed jobs
- `attestor.bulk_job_duration_seconds{status}` job runtime
- `attestor.bulk_items_total{status}` per-item outcomes (`succeeded`, `verification_failed`, `exception`)
The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and reschedules persistence conflicts with optimistic version checks. Results hydrate the verification cache; failed items record the error reason without aborting the overall job.
* **Response**:
```json
{ "ok": true, "uuid": "…", "index": 123, "logURL": "…", "checkedAt": "…" }
```
### 4.5 Bulk verification
`POST /api/v1/rekor/verify:bulk` enqueues a verification job containing up to `quotas.bulk.maxItemsPerJob` items. Each item mirrors the single verification payload (uuid | artifactSha256 | subject+envelopeId, optional policyVersion/refreshProof). The handler persists a MongoDB job document (`bulk_jobs` collection) and returns `202 Accepted` with a job descriptor and polling URL.
`GET /api/v1/rekor/verify:bulk/{jobId}` returns progress and per-item results (subject/uuid, status, issues, cached verification report if available). Jobs are tenant- and subject-scoped; only the initiating principal can read their progress.
**Worker path:** `BulkVerificationWorker` claims queued jobs (`status=queued → running`), executes items sequentially through the cached verification service, updates progress counters, and records metrics:
- `attestor.bulk_jobs_total{status}` completed/failed jobs
- `attestor.bulk_job_duration_seconds{status}` job runtime
- `attestor.bulk_items_total{status}` per-item outcomes (`succeeded`, `verification_failed`, `exception`)
The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and reschedules persistence conflicts with optimistic version checks. Results hydrate the verification cache; failed items record the error reason without aborting the overall job.
---
@@ -303,10 +396,10 @@ The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and r
* `subject.digest.sha256` values must be present and wellformed (hex).
* **No public submission** path. **Never** accept bundles from untrusted clients.
* **Client certificate allowlists**: optional `security.mtls.allowedSubjects` / `allowedThumbprints` tighten peer identity checks beyond CA pinning.
* **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded.
* **Scope enforcement**: API separates `attestor.write`, `attestor.verify`, and `attestor.read` policies; verification/list endpoints accept read or verify scopes while submission endpoints remain write-only.
* **Request hygiene**: JSON content-type is mandatory (415 returned otherwise); DSSE payloads are capped (default 2MiB), certificate chains limited to six entries, and signatures to six per envelope to mitigate parsing abuse.
* **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor.
* **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded.
* **Scope enforcement**: API separates `attestor.write`, `attestor.verify`, and `attestor.read` policies; verification/list endpoints accept read or verify scopes while submission endpoints remain write-only.
* **Request hygiene**: JSON content-type is mandatory (415 returned otherwise); DSSE payloads are capped (default 2MiB), certificate chains limited to six entries, and signatures to six per envelope to mitigate parsing abuse.
* **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor.
---
@@ -329,32 +422,32 @@ The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and r
## 8) Observability & audit
**Metrics** (Prometheus):
* `attestor.sign_total{result,algorithm,provider}`
* `attestor.sign_latency_seconds{algorithm,provider}`
* `attestor.submit_total{result,backend}`
* `attestor.submit_latency_seconds{backend}`
* `attestor.proof_fetch_total{subject,issuer,policy,result,attestor.log.backend}`
* `attestor.verify_total{subject,issuer,policy,result}`
* `attestor.verify_latency_seconds{subject,issuer,policy,result}`
* `attestor.dedupe_hits_total`
* `attestor.errors_total{type}`
SLO guardrails:
* `attestor.verify_latency_seconds` P95 ≤2s per policy.
* `attestor.verify_total{result="failed"}` ≤1% of `attestor.verify_total` over 30min rolling windows.
**Correlation**:
* HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing.
**Tracing**:
* Spans: `attestor.sign`, `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `attestor.verify`, `attestor.verify.refresh_proof`.
**Audit**:
**Metrics** (Prometheus):
* `attestor.sign_total{result,algorithm,provider}`
* `attestor.sign_latency_seconds{algorithm,provider}`
* `attestor.submit_total{result,backend}`
* `attestor.submit_latency_seconds{backend}`
* `attestor.proof_fetch_total{subject,issuer,policy,result,attestor.log.backend}`
* `attestor.verify_total{subject,issuer,policy,result}`
* `attestor.verify_latency_seconds{subject,issuer,policy,result}`
* `attestor.dedupe_hits_total`
* `attestor.errors_total{type}`
SLO guardrails:
* `attestor.verify_latency_seconds` P95 ≤2s per policy.
* `attestor.verify_total{result="failed"}` ≤1% of `attestor.verify_total` over 30min rolling windows.
**Correlation**:
* HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing.
**Tracing**:
* Spans: `attestor.sign`, `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `attestor.verify`, `attestor.verify.refresh_proof`.
**Audit**:
* Immutable `audit` rows (ts, caller, action, hashes, uuid, index, backend, result, latency).
@@ -365,45 +458,45 @@ SLO guardrails:
```yaml
attestor:
listen: "https://0.0.0.0:8444"
security:
mtls:
caBundle: /etc/ssl/signer-ca.pem
requireClientCert: true
authority:
issuer: "https://authority.internal"
jwksUrl: "https://authority.internal/jwks"
requireSenderConstraint: "dpop" # or "mtls"
signerIdentity:
mode: ["keyless","kms"]
fulcioRoots: ["/etc/fulcio/root.pem"]
allowedSANs: ["urn:stellaops:signer"]
kmsKeys: ["kms://cluster-kms/stellaops-signer"]
submissionLimits:
maxPayloadBytes: 2097152
maxCertificateChainEntries: 6
maxSignatures: 6
signing:
preferredProviders: ["kms","bouncycastle.ed25519","default"]
kms:
enabled: true
rootPath: "/var/lib/stellaops/kms"
password: "${ATTESTOR_KMS_PASSWORD}"
keys:
- keyId: "kms-primary"
algorithm: ES256
mode: kms
provider: "kms"
providerKeyId: "kms-primary"
kmsVersionId: "v1"
- keyId: "ed25519-offline"
algorithm: Ed25519
mode: keyful
provider: "bouncycastle.ed25519"
materialFormat: base64
materialPath: "/etc/stellaops/keys/ed25519.key"
certificateChain:
- "-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----"
rekor:
security:
mtls:
caBundle: /etc/ssl/signer-ca.pem
requireClientCert: true
authority:
issuer: "https://authority.internal"
jwksUrl: "https://authority.internal/jwks"
requireSenderConstraint: "dpop" # or "mtls"
signerIdentity:
mode: ["keyless","kms"]
fulcioRoots: ["/etc/fulcio/root.pem"]
allowedSANs: ["urn:stellaops:signer"]
kmsKeys: ["kms://cluster-kms/stellaops-signer"]
submissionLimits:
maxPayloadBytes: 2097152
maxCertificateChainEntries: 6
maxSignatures: 6
signing:
preferredProviders: ["kms","bouncycastle.ed25519","default"]
kms:
enabled: true
rootPath: "/var/lib/stellaops/kms"
password: "${ATTESTOR_KMS_PASSWORD}"
keys:
- keyId: "kms-primary"
algorithm: ES256
mode: kms
provider: "kms"
providerKeyId: "kms-primary"
kmsVersionId: "v1"
- keyId: "ed25519-offline"
algorithm: Ed25519
mode: keyful
provider: "bouncycastle.ed25519"
materialFormat: base64
materialPath: "/etc/stellaops/keys/ed25519.key"
certificateChain:
- "-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----"
rekor:
primary:
url: "https://rekor-v2.internal"
proofTimeoutMs: 15000
@@ -422,20 +515,20 @@ attestor:
objectLock: "governance"
redis:
url: "redis://redis:6379/2"
quotas:
perCaller:
qps: 50
burst: 100
```
**Notes:**
* `signing.preferredProviders` defines the resolution order when multiple providers support the requested algorithm. Omit to fall back to registration order.
* File-backed KMS (`signing.kms`) is required when at least one key uses `mode: kms`; the password should be injected via secret store or environment.
* For keyful providers, supply inline `material` or `materialPath` plus `materialFormat` (`pem` (default), `base64`, or `hex`). KMS keys ignore these fields and require `kmsVersionId`.
* `certificateChain` entries are appended to returned bundles so offline verifiers do not need to dereference external stores.
---
quotas:
perCaller:
qps: 50
burst: 100
```
**Notes:**
* `signing.preferredProviders` defines the resolution order when multiple providers support the requested algorithm. Omit to fall back to registration order.
* File-backed KMS (`signing.kms`) is required when at least one key uses `mode: kms`; the password should be injected via secret store or environment.
* For keyful providers, supply inline `material` or `materialPath` plus `materialFormat` (`pem` (default), `base64`, or `hex`). KMS keys ignore these fields and require `kmsVersionId`.
* `certificateChain` entries are appended to returned bundles so offline verifiers do not need to dereference external stores.
---
## 10) Endtoend sequences
@@ -477,11 +570,11 @@ sequenceDiagram
---
## 11) Failure modes & responses
| Condition | Return | Details | | |
| ------------------------------------- | ----------------------- | --------------------------------------------------------- | -------- | ------------ |
| mTLS/OpTok invalid | `401 invalid_token` | Include `WWW-Authenticate` DPoP challenge when applicable | | |
## 11) Failure modes & responses
| Condition | Return | Details | | |
| ------------------------------------- | ----------------------- | --------------------------------------------------------- | -------- | ------------ |
| mTLS/OpTok invalid | `401 invalid_token` | Include `WWW-Authenticate` DPoP challenge when applicable | | |
| Bundle not signed by trusted identity | `403 chain_untrusted` | DSSE accepted only from Signer identities | | |
| Duplicate bundle | `409 duplicate_bundle` | Return existing `uuid` (idempotent) | | |
| Rekor unreachable/timeout | `502 rekor_unavailable` | Retry with backoff; surface `Retry-After` | | |
@@ -529,14 +622,14 @@ sequenceDiagram
* **Duallog** write (primary + mirror) and **crosslog proof** packaging.
* **Cloud endorsement**: send `{uuid, artifactSha256}` to StellaOps cloud; store returned endorsement id for marketing/chainofcustody.
* **Checkpoint pinning**: periodically pin latest Rekor checkpoints to an external audit store for independent monitoring.
---
## 16) Observability (stub)
- Runbook + dashboard placeholder for offline import: `operations/observability.md`, `operations/dashboards/attestor-observability.json`.
- Metrics to surface: signing latency p95/p99, verification failure rate, transparency log submission lag, key rotation age, queue backlog, attestation bundle size histogram.
- Health endpoints: `/health/liveness`, `/health/readiness`, `/status`; verification probe `/api/attestations/verify` once demo bundle is available (see runbook).
- Alert hints: signing latency > 1s p99, verification failure spikes, tlog submission lag >10s, key rotation age over policy threshold, backlog above configured threshold.
* **Checkpoint pinning**: periodically pin latest Rekor checkpoints to an external audit store for independent monitoring.
---
## 16) Observability (stub)
- Runbook + dashboard placeholder for offline import: `operations/observability.md`, `operations/dashboards/attestor-observability.json`.
- Metrics to surface: signing latency p95/p99, verification failure rate, transparency log submission lag, key rotation age, queue backlog, attestation bundle size histogram.
- Health endpoints: `/health/liveness`, `/health/readiness`, `/status`; verification probe `/api/attestations/verify` once demo bundle is available (see runbook).
- Alert hints: signing latency > 1s p99, verification failure spikes, tlog submission lag >10s, key rotation age over policy threshold, backlog above configured threshold.

View File

@@ -0,0 +1,215 @@
# Proof Spine Assembly Algorithm
> **Sprint:** SPRINT_0501_0004_0001
> **Module:** Attestor / ProofChain
## Overview
The Proof Spine is the cryptographic backbone of StellaOps' proof chain. It aggregates evidence, reasoning, and VEX statements into a single merkle-rooted bundle that can be verified independently.
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROOF SPINE STRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SBOMEntryID │ │ EvidenceID[] │ │ ReasoningID │ │ VEXVerdictID │ │
│ │ (leaf 0) │ │ (leaves 1-N) │ │ (leaf N+1) │ │ (leaf N+2) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │ │
│ └─────────────────┴─────────────────┴─────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────┐ │
│ │ MERKLE TREE BUILDER │ │
│ │ - SHA-256 hash function │ │
│ │ - Lexicographic sorting │ │
│ │ - Power-of-2 padding │ │
│ └───────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────┐ │
│ │ ProofBundleID (Root) │ │
│ │ sha256:<64-hex-chars> │ │
│ └───────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Algorithm Specification
### Input
| Parameter | Type | Description |
|-----------|------|-------------|
| `sbomEntryId` | string | Content-addressed ID of the SBOM entry |
| `evidenceIds` | string[] | Array of evidence statement IDs |
| `reasoningId` | string | ID of the reasoning/policy match statement |
| `vexVerdictId` | string | ID of the VEX verdict statement |
### Output
| Parameter | Type | Description |
|-----------|------|-------------|
| `proofBundleId` | string | Merkle root in format `sha256:<64-hex>` |
### Pseudocode
```
FUNCTION BuildProofBundleMerkle(sbomEntryId, evidenceIds[], reasoningId, vexVerdictId):
// Step 1: Prepare leaves in deterministic order
leaves = []
leaves.append(SHA256(UTF8.GetBytes(sbomEntryId)))
// Step 2: Sort evidence IDs lexicographically
sortedEvidenceIds = evidenceIds.Sort(StringComparer.Ordinal)
FOR EACH evidenceId IN sortedEvidenceIds:
leaves.append(SHA256(UTF8.GetBytes(evidenceId)))
leaves.append(SHA256(UTF8.GetBytes(reasoningId)))
leaves.append(SHA256(UTF8.GetBytes(vexVerdictId)))
// Step 3: Pad to power of 2 (duplicate last leaf)
WHILE NOT IsPowerOfTwo(leaves.Length):
leaves.append(leaves[leaves.Length - 1])
// Step 4: Build tree bottom-up
currentLevel = leaves
WHILE currentLevel.Length > 1:
nextLevel = []
FOR i = 0 TO currentLevel.Length STEP 2:
left = currentLevel[i]
right = currentLevel[i + 1]
parent = SHA256(left || right) // Concatenate then hash
nextLevel.append(parent)
currentLevel = nextLevel
// Step 5: Return root as formatted ID
RETURN "sha256:" + HexEncode(currentLevel[0])
```
## Determinism Invariants
| Invariant | Rule | Rationale |
|-----------|------|-----------|
| Evidence Ordering | Lexicographic (byte comparison) | Reproducible across platforms |
| Hash Function | SHA-256 only | No algorithm negotiation |
| Padding | Duplicate last leaf | Not zeros, preserves tree structure |
| Concatenation | Left `\|\|` Right | Consistent ordering |
| String Encoding | UTF-8 | Cross-platform compatibility |
| ID Format | `sha256:<lowercase-hex>` | Canonical representation |
## Example
### Input
```json
{
"sbomEntryId": "sha256:abc123...",
"evidenceIds": [
"sha256:evidence-cve-2024-0001...",
"sha256:evidence-reachability...",
"sha256:evidence-sbom-component..."
],
"reasoningId": "sha256:reasoning-policy...",
"vexVerdictId": "sha256:vex-not-affected..."
}
```
### Processing
1. **Leaf 0**: `SHA256("sha256:abc123...")` → SBOM
2. **Leaf 1**: `SHA256("sha256:evidence-cve-2024-0001...")` → Evidence (sorted first)
3. **Leaf 2**: `SHA256("sha256:evidence-reachability...")` → Evidence
4. **Leaf 3**: `SHA256("sha256:evidence-sbom-component...")` → Evidence
5. **Leaf 4**: `SHA256("sha256:reasoning-policy...")` → Reasoning
6. **Leaf 5**: `SHA256("sha256:vex-not-affected...")` → VEX
7. **Padding**: Duplicate leaf 5 to get 8 leaves (power of 2)
### Tree Structure
```
ROOT
/ \
H1 H2
/ \ / \
H3 H4 H5 H6
/ \ / \ / \ / \
L0 L1 L2 L3 L4 L5 L5 L5 (padded)
```
### Output
```
sha256:7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
```
## Cross-Platform Verification
### Test Vector
For cross-platform compatibility testing, use this known test vector:
**Input:**
```json
{
"sbomEntryId": "sha256:0000000000000000000000000000000000000000000000000000000000000001",
"evidenceIds": [
"sha256:0000000000000000000000000000000000000000000000000000000000000002",
"sha256:0000000000000000000000000000000000000000000000000000000000000003"
],
"reasoningId": "sha256:0000000000000000000000000000000000000000000000000000000000000004",
"vexVerdictId": "sha256:0000000000000000000000000000000000000000000000000000000000000005"
}
```
All implementations (C#, Go, Rust, TypeScript) must produce the same root hash.
## Verification
To verify a proof bundle:
1. Obtain all constituent statements (SBOM, Evidence, Reasoning, VEX)
2. Extract their content-addressed IDs
3. Re-compute the merkle root using the algorithm above
4. Compare with the claimed `proofBundleId`
If the roots match, the bundle is valid and all statements are bound to this proof.
## API
### C# Interface
```csharp
public interface IProofSpineAssembler
{
/// <summary>
/// Assembles a proof spine from its constituent statements.
/// </summary>
ProofSpineResult Assemble(ProofSpineInput input);
}
public record ProofSpineInput
{
public required string SbomEntryId { get; init; }
public required IReadOnlyList<string> EvidenceIds { get; init; }
public required string ReasoningId { get; init; }
public required string VexVerdictId { get; init; }
}
public record ProofSpineResult
{
public required string ProofBundleId { get; init; }
public required byte[] MerkleRoot { get; init; }
public required IReadOnlyList<byte[]> LeafHashes { get; init; }
}
```
## Related Documentation
- [Proof and Evidence Chain Technical Reference](../product-advisories/14-Dec-2025%20-%20Proof%20and%20Evidence%20Chain%20Technical%20Reference.md) - §2.4, §4.2, §9
- [Content-Addressed IDs](./content-addressed-ids.md)
- [DSSE Predicates](./dsse-predicates.md)

View File

@@ -0,0 +1,159 @@
# TTFS (Time to First Signal) Alert Rules
# Reference: SPRINT_0341_0001_0001 Task T10
# These alerts monitor SLOs for the TTFS experience
groups:
- name: ttfs-slo
interval: 30s
rules:
# Primary SLO: P95 latency must be under 5 seconds
- alert: TtfsP95High
expr: |
histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface)) > 5
for: 5m
labels:
severity: page
component: ttfs
slo: ttfs-latency
annotations:
summary: "TTFS P95 latency exceeds 5s for {{ $labels.surface }}"
description: "Time to First Signal P95 is {{ $value | humanizeDuration }} for surface {{ $labels.surface }}. This breaches the TTFS SLO."
runbook: "docs/runbooks/ttfs-latency-high.md"
dashboard: "https://grafana.stellaops.local/d/ttfs-overview"
# Cache performance: Hit rate should be above 70%
- alert: TtfsCacheHitRateLow
expr: |
sum(rate(ttfs_cache_hit_total[5m])) / sum(rate(ttfs_signal_total[5m])) < 0.7
for: 10m
labels:
severity: warning
component: ttfs
annotations:
summary: "TTFS cache hit rate below 70%"
description: "Cache hit rate is {{ $value | humanizePercentage }}. Low cache hit rates increase TTFS latency."
runbook: "docs/runbooks/ttfs-cache-performance.md"
# Error rate: Should be under 1%
- alert: TtfsErrorRateHigh
expr: |
sum(rate(ttfs_error_total[5m])) / sum(rate(ttfs_signal_total[5m])) > 0.01
for: 5m
labels:
severity: warning
component: ttfs
annotations:
summary: "TTFS error rate exceeds 1%"
description: "Error rate is {{ $value | humanizePercentage }}. Check logs for FirstSignalService errors."
runbook: "docs/runbooks/ttfs-error-investigation.md"
# SLO breach counter: Too many breaches in a short window
- alert: TtfsSloBreach
expr: |
sum(increase(ttfs_slo_breach_total[5m])) > 10
for: 1m
labels:
severity: page
component: ttfs
slo: ttfs-breach-rate
annotations:
summary: "TTFS SLO breach rate high"
description: "{{ $value }} SLO breaches in last 5 minutes. Immediate investigation required."
runbook: "docs/runbooks/ttfs-slo-breach.md"
# Endpoint latency: HTTP endpoint should respond within 500ms
- alert: FirstSignalEndpointLatencyHigh
expr: |
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{route=~"/api/v1/orchestrator/runs/.*/first-signal"}[5m])) by (le)) > 0.5
for: 5m
labels:
severity: warning
component: ttfs
annotations:
summary: "First signal endpoint P95 latency > 500ms"
description: "The /first-signal API endpoint P95 is {{ $value | humanizeDuration }}. This is the API-level latency only."
runbook: "docs/runbooks/first-signal-api-slow.md"
- name: ttfs-availability
interval: 1m
rules:
# Availability: First signal endpoint should be available
- alert: FirstSignalEndpointDown
expr: |
up{job="orchestrator"} == 0
for: 2m
labels:
severity: critical
component: ttfs
annotations:
summary: "Orchestrator (First Signal provider) is down"
description: "The Orchestrator service is not responding. First Signal functionality is unavailable."
runbook: "docs/runbooks/orchestrator-down.md"
# No signals being generated
- alert: TtfsNoSignals
expr: |
sum(rate(ttfs_signal_total[10m])) == 0
for: 15m
labels:
severity: warning
component: ttfs
annotations:
summary: "No TTFS signals generated in 15 minutes"
description: "No First Signal events have been recorded. This could indicate no active runs or a metric collection issue."
- name: ttfs-ux
interval: 1m
rules:
# UX: High bounce rate indicates poor experience
- alert: TtfsBounceRateHigh
expr: |
sum(rate(ttfs_bounce_total[5m])) / sum(rate(ttfs_page_view_total[5m])) > 0.5
for: 30m
labels:
severity: warning
component: ttfs
area: ux
annotations:
summary: "TTFS page bounce rate exceeds 50%"
description: "More than 50% of users are leaving the run page within 10 seconds. This may indicate poor First Signal experience."
# UX: Long open-to-action time
- alert: TtfsOpenToActionSlow
expr: |
histogram_quantile(0.75, sum(rate(ttfs_open_to_action_seconds_bucket[15m])) by (le)) > 30
for: 1h
labels:
severity: info
component: ttfs
area: ux
annotations:
summary: "75% of users take >30s to first action"
description: "Users are taking a long time to act on First Signal. Consider UX improvements."
- name: ttfs-failure-signatures
interval: 30s
rules:
# New failure pattern emerging
- alert: TtfsNewFailurePatternHigh
expr: |
sum(rate(ttfs_failure_signature_new_total[5m])) > 1
for: 10m
labels:
severity: warning
component: ttfs
annotations:
summary: "High rate of new failure signatures"
description: "New failure patterns are being detected at {{ $value }}/s. This may indicate a new class of errors."
# Failure signature confidence upgrades
- alert: TtfsFailureSignatureConfidenceUpgrade
expr: |
sum(increase(ttfs_failure_signature_confidence_upgrade_total[1h])) > 5
for: 5m
labels:
severity: info
component: ttfs
annotations:
summary: "Multiple failure signatures upgraded to high confidence"
description: "{{ $value }} failure signatures have been upgraded to high confidence in the last hour."

View File

@@ -0,0 +1,552 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "datasource",
"uid": "grafana"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"description": "Time to First Signal (TTFS) observability dashboard for StellaOps",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"title": "TTFS P50/P95/P99 by Surface",
"type": "timeseries",
"gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 },
"id": 1,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "histogram_quantile(0.50, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
"legendFormat": "P50 - {{surface}}",
"refId": "A"
},
{
"expr": "histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
"legendFormat": "P95 - {{surface}}",
"refId": "B"
},
{
"expr": "histogram_quantile(0.99, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
"legendFormat": "P99 - {{surface}}",
"refId": "C"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "green" },
{ "value": 2, "color": "yellow" },
{ "value": 5, "color": "red" }
]
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10,
"showPoints": "auto"
}
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom",
"calcs": ["mean", "max", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
}
},
{
"title": "Cache Hit Rate",
"type": "stat",
"gridPos": { "x": 12, "y": 0, "w": 6, "h": 4 },
"id": 2,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(rate(ttfs_cache_hit_total[5m])) / sum(rate(ttfs_signal_total[5m]))",
"legendFormat": "Hit Rate",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "red" },
{ "value": 0.7, "color": "yellow" },
{ "value": 0.9, "color": "green" }
]
},
"mappings": []
},
"overrides": []
},
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"],
"fields": ""
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto"
}
},
{
"title": "SLO Breaches (P95 > 5s)",
"type": "stat",
"gridPos": { "x": 18, "y": 0, "w": 6, "h": 4 },
"id": 3,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(increase(ttfs_slo_breach_total[1h]))",
"legendFormat": "Breaches (1h)",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "green" },
{ "value": 1, "color": "yellow" },
{ "value": 10, "color": "red" }
]
},
"mappings": []
},
"overrides": []
},
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"],
"fields": ""
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto"
}
},
{
"title": "Signal Source Distribution",
"type": "piechart",
"gridPos": { "x": 12, "y": 4, "w": 6, "h": 4 },
"id": 4,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum by (signal_source) (rate(ttfs_signal_total[1h]))",
"legendFormat": "{{signal_source}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"mappings": []
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "list",
"placement": "right"
},
"pieType": "pie",
"tooltip": {
"mode": "single"
}
}
},
{
"title": "Failure Signature Matches",
"type": "stat",
"gridPos": { "x": 18, "y": 4, "w": 6, "h": 4 },
"id": 5,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(rate(ttfs_failure_signature_match_total[5m]))",
"legendFormat": "Matches/s",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "blue" }
]
}
},
"overrides": []
}
},
{
"title": "Signals by Kind",
"type": "timeseries",
"gridPos": { "x": 0, "y": 8, "w": 12, "h": 6 },
"id": 6,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum by (kind) (rate(ttfs_signal_total[5m]))",
"legendFormat": "{{kind}}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"custom": {
"lineWidth": 1,
"fillOpacity": 20,
"stacking": {
"mode": "normal",
"group": "A"
}
}
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
}
}
},
{
"title": "Error Rate",
"type": "timeseries",
"gridPos": { "x": 12, "y": 8, "w": 12, "h": 6 },
"id": 7,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(rate(ttfs_error_total[5m])) / sum(rate(ttfs_signal_total[5m]))",
"legendFormat": "Error Rate",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"max": 0.1,
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "green" },
{ "value": 0.01, "color": "yellow" },
{ "value": 0.05, "color": "red" }
]
},
"custom": {
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
}
}
},
{
"title": "TTFS Latency Heatmap",
"type": "heatmap",
"gridPos": { "x": 0, "y": 14, "w": 12, "h": 8 },
"id": 8,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(increase(ttfs_latency_seconds_bucket[1m])) by (le)",
"legendFormat": "{{le}}",
"format": "heatmap",
"refId": "A"
}
],
"options": {
"calculate": false,
"yAxis": {
"axisPlacement": "left",
"unit": "s"
},
"color": {
"scheme": "Spectral",
"mode": "scheme"
},
"cellGap": 1
}
},
{
"title": "First Signal Endpoint Latency",
"type": "timeseries",
"gridPos": { "x": 12, "y": 14, "w": 12, "h": 8 },
"id": 9,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
"legendFormat": "P50",
"refId": "A"
},
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
"legendFormat": "P95",
"refId": "B"
},
{
"expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
"legendFormat": "P99",
"refId": "C"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "green" },
{ "value": 0.3, "color": "yellow" },
{ "value": 0.5, "color": "red" }
]
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
}
},
"overrides": []
}
},
{
"title": "Open→Action Time Distribution",
"type": "histogram",
"gridPos": { "x": 0, "y": 22, "w": 8, "h": 6 },
"id": 10,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(increase(ttfs_open_to_action_seconds_bucket[5m])) by (le)",
"legendFormat": "{{le}}",
"format": "heatmap",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "s"
}
}
},
{
"title": "Bounce Rate (< 10s)",
"type": "stat",
"gridPos": { "x": 8, "y": 22, "w": 4, "h": 6 },
"id": 11,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "sum(rate(ttfs_bounce_total[5m])) / sum(rate(ttfs_page_view_total[5m]))",
"legendFormat": "Bounce Rate",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {
"mode": "absolute",
"steps": [
{ "value": null, "color": "green" },
{ "value": 0.3, "color": "yellow" },
{ "value": 0.5, "color": "red" }
]
}
}
}
},
{
"title": "Top Failure Signatures",
"type": "table",
"gridPos": { "x": 12, "y": 22, "w": 12, "h": 6 },
"id": 12,
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"targets": [
{
"expr": "topk(10, sum by (error_token, error_code) (ttfs_failure_signature_hit_total))",
"legendFormat": "{{error_token}} ({{error_code}})",
"format": "table",
"instant": true,
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto"
}
},
"overrides": [
{
"matcher": { "id": "byName", "options": "Value" },
"properties": [
{ "id": "displayName", "value": "Hit Count" }
]
}
]
},
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"renameByName": {
"error_token": "Token",
"error_code": "Code"
}
}
}
]
}
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": ["ttfs", "ux", "slo", "stellaops"],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Prometheus",
"value": "prometheus"
},
"hide": 0,
"includeAll": false,
"label": "Datasource",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"definition": "label_values(ttfs_latency_seconds_bucket, surface)",
"hide": 0,
"includeAll": true,
"label": "Surface",
"multi": true,
"name": "surface",
"options": [],
"query": {
"query": "label_values(ttfs_latency_seconds_bucket, surface)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {},
"timezone": "utc",
"title": "TTFS - Time to First Signal",
"uid": "ttfs-overview",
"version": 1,
"weekStart": ""
}

View File

@@ -361,7 +361,61 @@ export const TTFS_FIXTURES = {
};
```
## 12) References
## 12) Observability
### 12.1 Grafana Dashboard
The TTFS observability dashboard provides real-time visibility into signal latency, cache performance, and SLO compliance.
- **Dashboard file**: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json`
- **UID**: `ttfs-overview`
**Key panels:**
- TTFS P50/P95/P99 by Surface (timeseries)
- Cache Hit Rate (stat)
- SLO Breaches (stat with threshold coloring)
- Signal Source Distribution (piechart)
- Signals by Kind (stacked timeseries)
- Error Rate (timeseries)
- TTFS Latency Heatmap
- Top Failure Signatures (table)
### 12.2 Alert Rules
TTFS alerts are defined in `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`.
**Critical alerts:**
| Alert | Threshold | For |
|-------|-----------|-----|
| `TtfsP95High` | P95 > 5s | 5m |
| `TtfsSloBreach` | >10 breaches in 5m | 1m |
| `FirstSignalEndpointDown` | Orchestrator unavailable | 2m |
**Warning alerts:**
| Alert | Threshold | For |
|-------|-----------|-----|
| `TtfsCacheHitRateLow` | <70% | 10m |
| `TtfsErrorRateHigh` | >1% | 5m |
| `FirstSignalEndpointLatencyHigh` | P95 > 500ms | 5m |
### 12.3 Load Testing
Load tests validate TTFS performance under realistic conditions.
- **Test file**: `tests/load/ttfs-load-test.js`
- **Framework**: k6
**Scenarios:**
- Sustained: 50 RPS for 5 minutes
- Spike: Ramp to 200 RPS
- Soak: 25 RPS for 15 minutes
**Thresholds:**
- Cache-hit P95 ≤ 250ms
- Cold-path P95 ≤ 500ms
- Error rate < 0.1%
## 13) References
- Advisory: `docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md`
- Sprint 1 (Foundation): `docs/implplan/SPRINT_0338_0001_0001_ttfs_foundation.md`
@@ -371,3 +425,6 @@ export const TTFS_FIXTURES = {
- TTE Architecture: `docs/modules/telemetry/architecture.md`
- Telemetry Schema: `docs/schemas/ttfs-event.schema.json`
- Database Schema: `docs/db/schemas/ttfs.sql`
- Grafana Dashboard: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json`
- Alert Rules: `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`
- Load Tests: `tests/load/ttfs-load-test.js`