Add comprehensive security tests for OWASP A02, A05, A07, and A08 categories
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Lighthouse CI / Lighthouse Audit (push) Has been cancelled
Lighthouse CI / Axe Accessibility Audit (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
- Implemented tests for Cryptographic Failures (A02) to ensure proper handling of sensitive data, secure algorithms, and key management. - Added tests for Security Misconfiguration (A05) to validate production configurations, security headers, CORS settings, and feature management. - Developed tests for Authentication Failures (A07) to enforce strong password policies, rate limiting, session management, and MFA support. - Created tests for Software and Data Integrity Failures (A08) to verify artifact signatures, SBOM integrity, attestation chains, and feed updates.
This commit is contained in:
188
docs/modules/airgap/evidence-reconciliation.md
Normal file
188
docs/modules/airgap/evidence-reconciliation.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# Evidence Reconciliation
|
||||
|
||||
This document describes the evidence reconciliation algorithm implemented in the `StellaOps.AirGap.Importer` module. The algorithm provides deterministic, lattice-based reconciliation of security evidence from air-gapped bundles.
|
||||
|
||||
## Overview
|
||||
|
||||
Evidence reconciliation is a 5-step pipeline that transforms raw evidence artifacts (SBOMs, attestations, VEX documents) into a unified, content-addressed evidence graph suitable for policy evaluation and audit trails.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Evidence Reconciliation Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Step 1: Artifact Indexing │
|
||||
│ ├── EvidenceDirectoryDiscovery │
|
||||
│ ├── ArtifactIndex (digest-keyed) │
|
||||
│ └── Digest normalization (sha256:...) │
|
||||
│ │
|
||||
│ Step 2: Evidence Collection │
|
||||
│ ├── SbomCollector (CycloneDX, SPDX) │
|
||||
│ ├── AttestationCollector (DSSE) │
|
||||
│ └── Integration with DsseVerifier │
|
||||
│ │
|
||||
│ Step 3: Normalization │
|
||||
│ ├── JsonNormalizer (stable sorting) │
|
||||
│ ├── Timestamp stripping │
|
||||
│ └── URI lowercase normalization │
|
||||
│ │
|
||||
│ Step 4: Lattice Rules │
|
||||
│ ├── SourcePrecedenceLattice │
|
||||
│ ├── VEX merge with precedence │
|
||||
│ └── Conflict resolution │
|
||||
│ │
|
||||
│ Step 5: Graph Emission │
|
||||
│ ├── EvidenceGraph construction │
|
||||
│ ├── Deterministic serialization │
|
||||
│ └── SHA-256 manifest generation │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### Step 1: Artifact Indexing
|
||||
|
||||
**`ArtifactIndex`** - A digest-keyed index of all artifacts in the evidence bundle.
|
||||
|
||||
```csharp
|
||||
// Key types
|
||||
public readonly record struct DigestKey(string Algorithm, string Value);
|
||||
|
||||
// Normalization
|
||||
DigestKey.Parse("sha256:abc123...") → DigestKey("sha256", "abc123...")
|
||||
```
|
||||
|
||||
**`EvidenceDirectoryDiscovery`** - Discovers evidence files from a directory structure.
|
||||
|
||||
Expected structure:
|
||||
```
|
||||
evidence/
|
||||
├── sboms/
|
||||
│ ├── component-a.cdx.json
|
||||
│ └── component-b.spdx.json
|
||||
├── attestations/
|
||||
│ └── artifact.dsse.json
|
||||
└── vex/
|
||||
└── vendor-vex.json
|
||||
```
|
||||
|
||||
### Step 2: Evidence Collection
|
||||
|
||||
**Parsers:**
|
||||
- `CycloneDxParser` - Parses CycloneDX 1.4/1.5/1.6 format
|
||||
- `SpdxParser` - Parses SPDX 2.3 format
|
||||
- `DsseAttestationParser` - Parses DSSE envelopes
|
||||
|
||||
**Collectors:**
|
||||
- `SbomCollector` - Orchestrates SBOM parsing and indexing
|
||||
- `AttestationCollector` - Orchestrates attestation parsing and verification
|
||||
|
||||
### Step 3: Normalization
|
||||
|
||||
**`SbomNormalizer`** applies format-specific normalization:
|
||||
|
||||
| Rule | Description |
|
||||
|------|-------------|
|
||||
| Stable JSON sorting | Keys sorted alphabetically (ordinal) |
|
||||
| Timestamp stripping | Removes `created`, `modified`, `timestamp` fields |
|
||||
| URI normalization | Lowercases scheme, host, normalizes paths |
|
||||
| Whitespace normalization | Consistent formatting |
|
||||
|
||||
### Step 4: Lattice Rules
|
||||
|
||||
**`SourcePrecedenceLattice`** implements a bounded lattice for VEX source authority:
|
||||
|
||||
```
|
||||
Vendor (top)
|
||||
↑
|
||||
Maintainer
|
||||
↑
|
||||
ThirdParty
|
||||
↑
|
||||
Unknown (bottom)
|
||||
```
|
||||
|
||||
**Lattice Properties (verified by property-based tests):**
|
||||
- **Commutativity**: `Join(a, b) = Join(b, a)`
|
||||
- **Associativity**: `Join(Join(a, b), c) = Join(a, Join(b, c))`
|
||||
- **Idempotence**: `Join(a, a) = a`
|
||||
- **Absorption**: `Join(a, Meet(a, b)) = a`
|
||||
|
||||
**Conflict Resolution Order:**
|
||||
1. Higher precedence source wins
|
||||
2. More recent timestamp wins (when same precedence)
|
||||
3. Status priority: NotAffected > Fixed > UnderInvestigation > Affected > Unknown
|
||||
|
||||
### Step 5: Graph Emission
|
||||
|
||||
**`EvidenceGraph`** - A content-addressed graph of reconciled evidence:
|
||||
|
||||
```csharp
|
||||
public sealed record EvidenceGraph
|
||||
{
|
||||
public required string Version { get; init; }
|
||||
public required string DigestAlgorithm { get; init; }
|
||||
public required string RootDigest { get; init; }
|
||||
public required IReadOnlyList<EvidenceNode> Nodes { get; init; }
|
||||
public required IReadOnlyList<EvidenceEdge> Edges { get; init; }
|
||||
public required DateTimeOffset GeneratedAt { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
**Determinism guarantees:**
|
||||
- Nodes sorted by digest (ordinal)
|
||||
- Edges sorted by (source, target, type)
|
||||
- SHA-256 manifest includes content hash
|
||||
- Reproducible across runs with same inputs
|
||||
|
||||
## Integration
|
||||
|
||||
### CLI Usage
|
||||
|
||||
```bash
|
||||
# Verify offline evidence bundle
|
||||
stellaops verify offline \
|
||||
--evidence-dir /evidence \
|
||||
--artifact sha256:def456... \
|
||||
--policy verify-policy.yaml
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
```csharp
|
||||
// Reconcile evidence
|
||||
var reconciler = new EvidenceReconciler(options);
|
||||
var graph = await reconciler.ReconcileAsync(evidenceDir, cancellationToken);
|
||||
|
||||
// Verify determinism
|
||||
var hash1 = graph.ComputeHash();
|
||||
var graph2 = await reconciler.ReconcileAsync(evidenceDir, cancellationToken);
|
||||
var hash2 = graph2.ComputeHash();
|
||||
Debug.Assert(hash1 == hash2); // Always true
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Golden-File Tests
|
||||
|
||||
Test fixtures in `tests/AirGap/StellaOps.AirGap.Importer.Tests/Reconciliation/Fixtures/`:
|
||||
- `cyclonedx-sample.json` - CycloneDX 1.5 sample
|
||||
- `spdx-sample.json` - SPDX 2.3 sample
|
||||
- `dsse-attestation-sample.json` - DSSE envelope sample
|
||||
|
||||
### Property-Based Tests
|
||||
|
||||
`SourcePrecedenceLatticePropertyTests` verifies:
|
||||
- Lattice algebraic properties (commutativity, associativity, idempotence, absorption)
|
||||
- Ordering properties (antisymmetry, transitivity, reflexivity)
|
||||
- Bound properties (join is LUB, meet is GLB)
|
||||
- Merge determinism
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Air-Gap Module Architecture](./architecture.md) *(pending)*
|
||||
- [DSSE Verification](../../adr/dsse-verification.md) *(if exists)*
|
||||
- [Offline Kit Import Flow](./exporter-cli-coordination.md)
|
||||
@@ -45,23 +45,23 @@ Trust boundary: **Only the Signer** is allowed to call submission endpoints; enf
|
||||
- `StellaOps.BuildProvenance@1`
|
||||
- `StellaOps.SBOMAttestation@1`
|
||||
- `StellaOps.ScanResults@1`
|
||||
- `StellaOps.PolicyEvaluation@1`
|
||||
- `StellaOps.VEXAttestation@1`
|
||||
- `StellaOps.RiskProfileEvidence@1`
|
||||
|
||||
Each predicate embeds subject digests, issuer metadata, policy context, materials, and optional transparency hints. Unsupported predicates return `422 predicate_unsupported`.
|
||||
|
||||
> **Golden fixtures:** Deterministic JSON statements for each predicate live in `src/Attestor/StellaOps.Attestor.Types/samples`. They are kept stable by the `StellaOps.Attestor.Types.Tests` project so downstream docs and contracts can rely on them without drifting.
|
||||
- `StellaOps.PolicyEvaluation@1`
|
||||
- `StellaOps.VEXAttestation@1`
|
||||
- `StellaOps.RiskProfileEvidence@1`
|
||||
|
||||
### Envelope & signature model
|
||||
- DSSE envelopes canonicalised (stable JSON ordering) prior to hashing.
|
||||
- Signature modes: keyless (Fulcio cert chain), keyful (KMS/HSM), hardware (FIDO2/WebAuthn). Multiple signatures allowed.
|
||||
- Rekor entry stores bundle hash, certificate chain, and optional witness endorsements.
|
||||
- Archive CAS retains original envelope plus metadata for offline verification.
|
||||
- Envelope serializer emits **compact** (canonical, minified) and **expanded** (annotated, indented) JSON variants off the same canonical byte stream so hashing stays deterministic while humans get context.
|
||||
- Payload handling supports **optional compression** (`gzip`, `brotli`) with compression metadata recorded in the expanded view and digesting always performed over the uncompressed bytes.
|
||||
- Expanded envelopes surface **detached payload references** (URI, digest, media type, size) so large artifacts can live in CAS/object storage while the canonical payload remains embedded for verification.
|
||||
- Payload previews auto-render JSON or UTF-8 text in the expanded output to simplify triage in air-gapped and offline review flows.
|
||||
Each predicate embeds subject digests, issuer metadata, policy context, materials, and optional transparency hints. Unsupported predicates return `422 predicate_unsupported`.
|
||||
|
||||
> **Golden fixtures:** Deterministic JSON statements for each predicate live in `src/Attestor/StellaOps.Attestor.Types/samples`. They are kept stable by the `StellaOps.Attestor.Types.Tests` project so downstream docs and contracts can rely on them without drifting.
|
||||
|
||||
### Envelope & signature model
|
||||
- DSSE envelopes canonicalised (stable JSON ordering) prior to hashing.
|
||||
- Signature modes: keyless (Fulcio cert chain), keyful (KMS/HSM), hardware (FIDO2/WebAuthn). Multiple signatures allowed.
|
||||
- Rekor entry stores bundle hash, certificate chain, and optional witness endorsements.
|
||||
- Archive CAS retains original envelope plus metadata for offline verification.
|
||||
- Envelope serializer emits **compact** (canonical, minified) and **expanded** (annotated, indented) JSON variants off the same canonical byte stream so hashing stays deterministic while humans get context.
|
||||
- Payload handling supports **optional compression** (`gzip`, `brotli`) with compression metadata recorded in the expanded view and digesting always performed over the uncompressed bytes.
|
||||
- Expanded envelopes surface **detached payload references** (URI, digest, media type, size) so large artifacts can live in CAS/object storage while the canonical payload remains embedded for verification.
|
||||
- Payload previews auto-render JSON or UTF-8 text in the expanded output to simplify triage in air-gapped and offline review flows.
|
||||
|
||||
### Verification pipeline overview
|
||||
1. Fetch envelope (from request, cache, or storage) and validate DSSE structure.
|
||||
@@ -70,6 +70,33 @@ Each predicate embeds subject digests, issuer metadata, policy context, material
|
||||
4. Validate Merkle proof against checkpoint; optionally verify witness endorsement.
|
||||
5. Return cached verification bundle including policy verdict and timestamps.
|
||||
|
||||
### Rekor Inclusion Proof Verification (SPRINT_3000_0001_0001)
|
||||
|
||||
The Attestor implements RFC 6962-compliant Merkle inclusion proof verification for Rekor transparency log entries:
|
||||
|
||||
**Components:**
|
||||
- `MerkleProofVerifier` — Verifies Merkle audit paths per RFC 6962 Section 2.1.1
|
||||
- `CheckpointSignatureVerifier` — Parses and verifies Rekor checkpoint signatures (ECDSA/Ed25519)
|
||||
- `RekorVerificationOptions` — Configuration for public keys, offline mode, and checkpoint caching
|
||||
|
||||
**Verification Flow:**
|
||||
1. Parse checkpoint body (origin, tree size, root hash)
|
||||
2. Verify checkpoint signature against Rekor public key
|
||||
3. Compute leaf hash from canonicalized entry
|
||||
4. Walk Merkle path from leaf to root using RFC 6962 interior node hashing
|
||||
5. Compare computed root with checkpoint root hash (constant-time)
|
||||
|
||||
**Offline Mode:**
|
||||
- Bundled checkpoints can be used in air-gapped environments
|
||||
- `EnableOfflineMode` and `OfflineCheckpointBundlePath` configuration options
|
||||
- `AllowOfflineWithoutSignature` for fully disconnected scenarios (reduced security)
|
||||
|
||||
**Metrics:**
|
||||
- `attestor.rekor_inclusion_verify_total` — Verification attempts by result
|
||||
- `attestor.rekor_checkpoint_verify_total` — Checkpoint signature verifications
|
||||
- `attestor.rekor_offline_verify_total` — Offline mode verifications
|
||||
- `attestor.rekor_checkpoint_cache_hits/misses` — Checkpoint cache performance
|
||||
|
||||
### UI & CLI touchpoints
|
||||
- Console: Evidence browser, verification report, chain-of-custody graph, issuer/key management, attestation workbench, bulk verification views.
|
||||
- CLI: `stella attest sign|verify|list|fetch|key` with offline verification and export bundle support.
|
||||
@@ -127,6 +154,72 @@ Indexes:
|
||||
|
||||
---
|
||||
|
||||
## 2.1) Content-Addressed Identifier Formats
|
||||
|
||||
The ProofChain library (`StellaOps.Attestor.ProofChain`) defines canonical content-addressed identifiers for all proof chain components. These IDs ensure determinism, tamper-evidence, and reproducibility.
|
||||
|
||||
### Identifier Types
|
||||
|
||||
| ID Type | Format | Source | Example |
|
||||
|---------|--------|--------|---------|
|
||||
| **ArtifactID** | `sha256:<64-hex>` | Container manifest or binary hash | `sha256:a1b2c3d4e5f6...` |
|
||||
| **SBOMEntryID** | `<sbomDigest>:<purl>[@<version>]` | SBOM hash + component PURL | `sha256:91f2ab3c:pkg:npm/lodash@4.17.21` |
|
||||
| **EvidenceID** | `sha256:<hash>` | Canonical evidence JSON | `sha256:e7f8a9b0c1d2...` |
|
||||
| **ReasoningID** | `sha256:<hash>` | Canonical reasoning JSON | `sha256:f0e1d2c3b4a5...` |
|
||||
| **VEXVerdictID** | `sha256:<hash>` | Canonical VEX verdict JSON | `sha256:d4c5b6a7e8f9...` |
|
||||
| **ProofBundleID** | `sha256:<merkle_root>` | Merkle root of bundle components | `sha256:1a2b3c4d5e6f...` |
|
||||
| **GraphRevisionID** | `grv_sha256:<hash>` | Merkle root of graph state | `grv_sha256:9f8e7d6c5b4a...` |
|
||||
|
||||
### Canonicalization (RFC 8785)
|
||||
|
||||
All JSON-based IDs use RFC 8785 (JCS) canonicalization:
|
||||
- UTF-8 encoding
|
||||
- Lexicographically sorted keys
|
||||
- No whitespace (minified)
|
||||
- No volatile fields (timestamps, random values excluded)
|
||||
|
||||
**Implementation:** `StellaOps.Attestor.ProofChain.Json.Rfc8785JsonCanonicalizer`
|
||||
|
||||
### Merkle Tree Construction
|
||||
|
||||
ProofBundleID and GraphRevisionID use deterministic binary Merkle trees:
|
||||
- SHA-256 hash function
|
||||
- Lexicographically sorted leaf inputs
|
||||
- Standard binary tree construction (pair-wise hashing)
|
||||
- Odd leaves promoted to next level
|
||||
|
||||
**Implementation:** `StellaOps.Attestor.ProofChain.Merkle.DeterministicMerkleTreeBuilder`
|
||||
|
||||
### ID Generation Interface
|
||||
|
||||
```csharp
|
||||
// Core interface for ID generation
|
||||
public interface IContentAddressedIdGenerator
|
||||
{
|
||||
EvidenceId GenerateEvidenceId(EvidencePredicate predicate);
|
||||
ReasoningId GenerateReasoningId(ReasoningPredicate predicate);
|
||||
VexVerdictId GenerateVexVerdictId(VexPredicate predicate);
|
||||
ProofBundleId GenerateProofBundleId(SbomEntryId sbom, EvidenceId[] evidence,
|
||||
ReasoningId reasoning, VexVerdictId verdict);
|
||||
GraphRevisionId GenerateGraphRevisionId(GraphState state);
|
||||
}
|
||||
```
|
||||
|
||||
### Predicate Types
|
||||
|
||||
The ProofChain library defines DSSE predicates for each attestation type:
|
||||
|
||||
| Predicate | Type URI | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| `EvidencePredicate` | `stellaops.org/evidence/v1` | Scan evidence (findings, reachability) |
|
||||
| `ReasoningPredicate` | `stellaops.org/reasoning/v1` | Exploitability reasoning |
|
||||
| `VexPredicate` | `stellaops.org/vex-verdict/v1` | VEX status determination |
|
||||
| `ProofSpinePredicate` | `stellaops.org/proof-spine/v1` | Complete proof bundle |
|
||||
|
||||
**Reference:** `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/`
|
||||
|
||||
---
|
||||
|
||||
## 3) Input contract (from Signer)
|
||||
|
||||
**Attestor accepts only** DSSE envelopes that satisfy all of:
|
||||
@@ -157,53 +250,53 @@ Indexes:
|
||||
|
||||
## 4) APIs
|
||||
|
||||
### 4.1 Signing
|
||||
|
||||
`POST /api/v1/attestations:sign` *(mTLS + OpTok required)*
|
||||
|
||||
* **Purpose**: Deterministically wrap Stella Ops payloads in DSSE envelopes before Rekor submission. Reuses the submission rate limiter and honours caller tenancy/audience scopes.
|
||||
* **Body**:
|
||||
|
||||
```json
|
||||
{
|
||||
"keyId": "signing-key-id",
|
||||
"payloadType": "application/vnd.in-toto+json",
|
||||
"payload": "<base64 payload>",
|
||||
"mode": "keyless|keyful|kms",
|
||||
"certificateChain": ["-----BEGIN CERTIFICATE-----..."],
|
||||
"artifact": {
|
||||
"sha256": "<subject sha256>",
|
||||
"kind": "sbom|report|vex-export",
|
||||
"imageDigest": "sha256:...",
|
||||
"subjectUri": "oci://..."
|
||||
},
|
||||
"logPreference": "primary|mirror|both",
|
||||
"archive": true
|
||||
}
|
||||
```
|
||||
|
||||
* **Behaviour**:
|
||||
* Resolve the signing key from `attestor.signing.keys[]` (includes algorithm, provider, and optional KMS version).
|
||||
* Compute DSSE pre‑authentication encoding, sign with the resolved provider (default EC, BouncyCastle Ed25519, or File‑KMS ES256), and add static + request certificate chains.
|
||||
* Canonicalise the resulting bundle, derive `bundleSha256`, and mirror the request meta shape used by `/api/v1/rekor/entries`.
|
||||
* Emit `attestor.sign_total{result,algorithm,provider}` and `attestor.sign_latency_seconds{algorithm,provider}` metrics and append an audit row (`action=sign`).
|
||||
* **Response 200**:
|
||||
|
||||
```json
|
||||
{
|
||||
"bundle": { "dsse": { "payloadType": "...", "payload": "...", "signatures": [{ "keyid": "signing-key-id", "sig": "..." }] }, "certificateChain": ["..."], "mode": "kms" },
|
||||
"meta": { "artifact": { "sha256": "...", "kind": "sbom" }, "bundleSha256": "...", "logPreference": "primary", "archive": true },
|
||||
"key": { "keyId": "signing-key-id", "algorithm": "ES256", "mode": "kms", "provider": "kms", "signedAt": "2025-11-01T12:34:56Z" }
|
||||
}
|
||||
```
|
||||
|
||||
* **Errors**: `400 key_not_found`, `400 payload_missing|payload_invalid_base64|artifact_sha_missing`, `400 mode_not_allowed`, `403 client_certificate_required`, `401 invalid_token`, `500 signing_failed`.
|
||||
|
||||
### 4.2 Submission
|
||||
|
||||
`POST /api/v1/rekor/entries` *(mTLS + OpTok required)*
|
||||
|
||||
* **Body**: as above.
|
||||
### 4.1 Signing
|
||||
|
||||
`POST /api/v1/attestations:sign` *(mTLS + OpTok required)*
|
||||
|
||||
* **Purpose**: Deterministically wrap Stella Ops payloads in DSSE envelopes before Rekor submission. Reuses the submission rate limiter and honours caller tenancy/audience scopes.
|
||||
* **Body**:
|
||||
|
||||
```json
|
||||
{
|
||||
"keyId": "signing-key-id",
|
||||
"payloadType": "application/vnd.in-toto+json",
|
||||
"payload": "<base64 payload>",
|
||||
"mode": "keyless|keyful|kms",
|
||||
"certificateChain": ["-----BEGIN CERTIFICATE-----..."],
|
||||
"artifact": {
|
||||
"sha256": "<subject sha256>",
|
||||
"kind": "sbom|report|vex-export",
|
||||
"imageDigest": "sha256:...",
|
||||
"subjectUri": "oci://..."
|
||||
},
|
||||
"logPreference": "primary|mirror|both",
|
||||
"archive": true
|
||||
}
|
||||
```
|
||||
|
||||
* **Behaviour**:
|
||||
* Resolve the signing key from `attestor.signing.keys[]` (includes algorithm, provider, and optional KMS version).
|
||||
* Compute DSSE pre‑authentication encoding, sign with the resolved provider (default EC, BouncyCastle Ed25519, or File‑KMS ES256), and add static + request certificate chains.
|
||||
* Canonicalise the resulting bundle, derive `bundleSha256`, and mirror the request meta shape used by `/api/v1/rekor/entries`.
|
||||
* Emit `attestor.sign_total{result,algorithm,provider}` and `attestor.sign_latency_seconds{algorithm,provider}` metrics and append an audit row (`action=sign`).
|
||||
* **Response 200**:
|
||||
|
||||
```json
|
||||
{
|
||||
"bundle": { "dsse": { "payloadType": "...", "payload": "...", "signatures": [{ "keyid": "signing-key-id", "sig": "..." }] }, "certificateChain": ["..."], "mode": "kms" },
|
||||
"meta": { "artifact": { "sha256": "...", "kind": "sbom" }, "bundleSha256": "...", "logPreference": "primary", "archive": true },
|
||||
"key": { "keyId": "signing-key-id", "algorithm": "ES256", "mode": "kms", "provider": "kms", "signedAt": "2025-11-01T12:34:56Z" }
|
||||
}
|
||||
```
|
||||
|
||||
* **Errors**: `400 key_not_found`, `400 payload_missing|payload_invalid_base64|artifact_sha_missing`, `400 mode_not_allowed`, `403 client_certificate_required`, `401 invalid_token`, `500 signing_failed`.
|
||||
|
||||
### 4.2 Submission
|
||||
|
||||
`POST /api/v1/rekor/entries` *(mTLS + OpTok required)*
|
||||
|
||||
* **Body**: as above.
|
||||
* **Behavior**:
|
||||
|
||||
* Verify caller (mTLS + OpTok).
|
||||
@@ -226,16 +319,16 @@ Indexes:
|
||||
"status": "included"
|
||||
}
|
||||
```
|
||||
* **Errors**: `401 invalid_token`, `403 not_signer|chain_untrusted`, `409 duplicate_bundle` (with existing `uuid`), `502 rekor_unavailable`, `504 proof_timeout`.
|
||||
|
||||
### 4.3 Proof retrieval
|
||||
|
||||
`GET /api/v1/rekor/entries/{uuid}`
|
||||
* **Errors**: `401 invalid_token`, `403 not_signer|chain_untrusted`, `409 duplicate_bundle` (with existing `uuid`), `502 rekor_unavailable`, `504 proof_timeout`.
|
||||
|
||||
### 4.3 Proof retrieval
|
||||
|
||||
`GET /api/v1/rekor/entries/{uuid}`
|
||||
|
||||
* Returns `entries` row (refreshes proof from Rekor if stale/missing).
|
||||
* Accepts `?refresh=true` to force backend query.
|
||||
|
||||
### 4.4 Verification (third‑party or internal)
|
||||
### 4.4 Verification (third‑party or internal)
|
||||
|
||||
`POST /api/v1/rekor/verify`
|
||||
|
||||
@@ -250,28 +343,28 @@ Indexes:
|
||||
1. **Bundle signature** → cert chain to Fulcio/KMS roots configured.
|
||||
2. **Inclusion proof** → recompute leaf hash; verify Merkle path against checkpoint root.
|
||||
3. Optionally verify **checkpoint** against local trust anchors (if Rekor signs checkpoints).
|
||||
4. Confirm **subject.digest** matches caller‑provided hash (when given).
|
||||
5. Fetch **transparency witness** statement when enabled; cache results and downgrade status to WARN when endorsements are missing or mismatched.
|
||||
4. Confirm **subject.digest** matches caller‑provided hash (when given).
|
||||
5. Fetch **transparency witness** statement when enabled; cache results and downgrade status to WARN when endorsements are missing or mismatched.
|
||||
|
||||
* **Response**:
|
||||
|
||||
```json
|
||||
{ "ok": true, "uuid": "…", "index": 123, "logURL": "…", "checkedAt": "…" }
|
||||
```
|
||||
|
||||
### 4.5 Bulk verification
|
||||
|
||||
`POST /api/v1/rekor/verify:bulk` enqueues a verification job containing up to `quotas.bulk.maxItemsPerJob` items. Each item mirrors the single verification payload (uuid | artifactSha256 | subject+envelopeId, optional policyVersion/refreshProof). The handler persists a MongoDB job document (`bulk_jobs` collection) and returns `202 Accepted` with a job descriptor and polling URL.
|
||||
|
||||
`GET /api/v1/rekor/verify:bulk/{jobId}` returns progress and per-item results (subject/uuid, status, issues, cached verification report if available). Jobs are tenant- and subject-scoped; only the initiating principal can read their progress.
|
||||
|
||||
**Worker path:** `BulkVerificationWorker` claims queued jobs (`status=queued → running`), executes items sequentially through the cached verification service, updates progress counters, and records metrics:
|
||||
|
||||
- `attestor.bulk_jobs_total{status}` – completed/failed jobs
|
||||
- `attestor.bulk_job_duration_seconds{status}` – job runtime
|
||||
- `attestor.bulk_items_total{status}` – per-item outcomes (`succeeded`, `verification_failed`, `exception`)
|
||||
|
||||
The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and reschedules persistence conflicts with optimistic version checks. Results hydrate the verification cache; failed items record the error reason without aborting the overall job.
|
||||
* **Response**:
|
||||
|
||||
```json
|
||||
{ "ok": true, "uuid": "…", "index": 123, "logURL": "…", "checkedAt": "…" }
|
||||
```
|
||||
|
||||
### 4.5 Bulk verification
|
||||
|
||||
`POST /api/v1/rekor/verify:bulk` enqueues a verification job containing up to `quotas.bulk.maxItemsPerJob` items. Each item mirrors the single verification payload (uuid | artifactSha256 | subject+envelopeId, optional policyVersion/refreshProof). The handler persists a MongoDB job document (`bulk_jobs` collection) and returns `202 Accepted` with a job descriptor and polling URL.
|
||||
|
||||
`GET /api/v1/rekor/verify:bulk/{jobId}` returns progress and per-item results (subject/uuid, status, issues, cached verification report if available). Jobs are tenant- and subject-scoped; only the initiating principal can read their progress.
|
||||
|
||||
**Worker path:** `BulkVerificationWorker` claims queued jobs (`status=queued → running`), executes items sequentially through the cached verification service, updates progress counters, and records metrics:
|
||||
|
||||
- `attestor.bulk_jobs_total{status}` – completed/failed jobs
|
||||
- `attestor.bulk_job_duration_seconds{status}` – job runtime
|
||||
- `attestor.bulk_items_total{status}` – per-item outcomes (`succeeded`, `verification_failed`, `exception`)
|
||||
|
||||
The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and reschedules persistence conflicts with optimistic version checks. Results hydrate the verification cache; failed items record the error reason without aborting the overall job.
|
||||
|
||||
---
|
||||
|
||||
@@ -303,10 +396,10 @@ The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and r
|
||||
* `subject.digest.sha256` values must be present and well‑formed (hex).
|
||||
* **No public submission** path. **Never** accept bundles from untrusted clients.
|
||||
* **Client certificate allowlists**: optional `security.mtls.allowedSubjects` / `allowedThumbprints` tighten peer identity checks beyond CA pinning.
|
||||
* **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded.
|
||||
* **Scope enforcement**: API separates `attestor.write`, `attestor.verify`, and `attestor.read` policies; verification/list endpoints accept read or verify scopes while submission endpoints remain write-only.
|
||||
* **Request hygiene**: JSON content-type is mandatory (415 returned otherwise); DSSE payloads are capped (default 2 MiB), certificate chains limited to six entries, and signatures to six per envelope to mitigate parsing abuse.
|
||||
* **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor.
|
||||
* **Rate limits**: token-bucket per caller derived from `quotas.perCaller` (QPS/burst) returns `429` + `Retry-After` when exceeded.
|
||||
* **Scope enforcement**: API separates `attestor.write`, `attestor.verify`, and `attestor.read` policies; verification/list endpoints accept read or verify scopes while submission endpoints remain write-only.
|
||||
* **Request hygiene**: JSON content-type is mandatory (415 returned otherwise); DSSE payloads are capped (default 2 MiB), certificate chains limited to six entries, and signatures to six per envelope to mitigate parsing abuse.
|
||||
* **Redaction**: Attestor never logs secret material; DSSE payloads **should** be public by design (SBOMs/reports). If customers require redaction, enforce policy at Signer (predicate minimization) **before** Attestor.
|
||||
|
||||
---
|
||||
|
||||
@@ -329,32 +422,32 @@ The worker honours `bulkVerification.itemDelayMilliseconds` for throttling and r
|
||||
|
||||
## 8) Observability & audit
|
||||
|
||||
**Metrics** (Prometheus):
|
||||
|
||||
* `attestor.sign_total{result,algorithm,provider}`
|
||||
* `attestor.sign_latency_seconds{algorithm,provider}`
|
||||
* `attestor.submit_total{result,backend}`
|
||||
* `attestor.submit_latency_seconds{backend}`
|
||||
* `attestor.proof_fetch_total{subject,issuer,policy,result,attestor.log.backend}`
|
||||
* `attestor.verify_total{subject,issuer,policy,result}`
|
||||
* `attestor.verify_latency_seconds{subject,issuer,policy,result}`
|
||||
* `attestor.dedupe_hits_total`
|
||||
* `attestor.errors_total{type}`
|
||||
|
||||
SLO guardrails:
|
||||
|
||||
* `attestor.verify_latency_seconds` P95 ≤ 2 s per policy.
|
||||
* `attestor.verify_total{result="failed"}` ≤ 1 % of `attestor.verify_total` over 30 min rolling windows.
|
||||
|
||||
**Correlation**:
|
||||
|
||||
* HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing.
|
||||
|
||||
**Tracing**:
|
||||
|
||||
* Spans: `attestor.sign`, `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `attestor.verify`, `attestor.verify.refresh_proof`.
|
||||
|
||||
**Audit**:
|
||||
**Metrics** (Prometheus):
|
||||
|
||||
* `attestor.sign_total{result,algorithm,provider}`
|
||||
* `attestor.sign_latency_seconds{algorithm,provider}`
|
||||
* `attestor.submit_total{result,backend}`
|
||||
* `attestor.submit_latency_seconds{backend}`
|
||||
* `attestor.proof_fetch_total{subject,issuer,policy,result,attestor.log.backend}`
|
||||
* `attestor.verify_total{subject,issuer,policy,result}`
|
||||
* `attestor.verify_latency_seconds{subject,issuer,policy,result}`
|
||||
* `attestor.dedupe_hits_total`
|
||||
* `attestor.errors_total{type}`
|
||||
|
||||
SLO guardrails:
|
||||
|
||||
* `attestor.verify_latency_seconds` P95 ≤ 2 s per policy.
|
||||
* `attestor.verify_total{result="failed"}` ≤ 1 % of `attestor.verify_total` over 30 min rolling windows.
|
||||
|
||||
**Correlation**:
|
||||
|
||||
* HTTP callers may supply `X-Correlation-Id`; Attestor will echo the header and push `CorrelationId` into the log scope for cross-service tracing.
|
||||
|
||||
**Tracing**:
|
||||
|
||||
* Spans: `attestor.sign`, `validate`, `rekor.submit`, `rekor.poll`, `persist`, `archive`, `attestor.verify`, `attestor.verify.refresh_proof`.
|
||||
|
||||
**Audit**:
|
||||
|
||||
* Immutable `audit` rows (ts, caller, action, hashes, uuid, index, backend, result, latency).
|
||||
|
||||
@@ -365,45 +458,45 @@ SLO guardrails:
|
||||
```yaml
|
||||
attestor:
|
||||
listen: "https://0.0.0.0:8444"
|
||||
security:
|
||||
mtls:
|
||||
caBundle: /etc/ssl/signer-ca.pem
|
||||
requireClientCert: true
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
jwksUrl: "https://authority.internal/jwks"
|
||||
requireSenderConstraint: "dpop" # or "mtls"
|
||||
signerIdentity:
|
||||
mode: ["keyless","kms"]
|
||||
fulcioRoots: ["/etc/fulcio/root.pem"]
|
||||
allowedSANs: ["urn:stellaops:signer"]
|
||||
kmsKeys: ["kms://cluster-kms/stellaops-signer"]
|
||||
submissionLimits:
|
||||
maxPayloadBytes: 2097152
|
||||
maxCertificateChainEntries: 6
|
||||
maxSignatures: 6
|
||||
signing:
|
||||
preferredProviders: ["kms","bouncycastle.ed25519","default"]
|
||||
kms:
|
||||
enabled: true
|
||||
rootPath: "/var/lib/stellaops/kms"
|
||||
password: "${ATTESTOR_KMS_PASSWORD}"
|
||||
keys:
|
||||
- keyId: "kms-primary"
|
||||
algorithm: ES256
|
||||
mode: kms
|
||||
provider: "kms"
|
||||
providerKeyId: "kms-primary"
|
||||
kmsVersionId: "v1"
|
||||
- keyId: "ed25519-offline"
|
||||
algorithm: Ed25519
|
||||
mode: keyful
|
||||
provider: "bouncycastle.ed25519"
|
||||
materialFormat: base64
|
||||
materialPath: "/etc/stellaops/keys/ed25519.key"
|
||||
certificateChain:
|
||||
- "-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----"
|
||||
rekor:
|
||||
security:
|
||||
mtls:
|
||||
caBundle: /etc/ssl/signer-ca.pem
|
||||
requireClientCert: true
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
jwksUrl: "https://authority.internal/jwks"
|
||||
requireSenderConstraint: "dpop" # or "mtls"
|
||||
signerIdentity:
|
||||
mode: ["keyless","kms"]
|
||||
fulcioRoots: ["/etc/fulcio/root.pem"]
|
||||
allowedSANs: ["urn:stellaops:signer"]
|
||||
kmsKeys: ["kms://cluster-kms/stellaops-signer"]
|
||||
submissionLimits:
|
||||
maxPayloadBytes: 2097152
|
||||
maxCertificateChainEntries: 6
|
||||
maxSignatures: 6
|
||||
signing:
|
||||
preferredProviders: ["kms","bouncycastle.ed25519","default"]
|
||||
kms:
|
||||
enabled: true
|
||||
rootPath: "/var/lib/stellaops/kms"
|
||||
password: "${ATTESTOR_KMS_PASSWORD}"
|
||||
keys:
|
||||
- keyId: "kms-primary"
|
||||
algorithm: ES256
|
||||
mode: kms
|
||||
provider: "kms"
|
||||
providerKeyId: "kms-primary"
|
||||
kmsVersionId: "v1"
|
||||
- keyId: "ed25519-offline"
|
||||
algorithm: Ed25519
|
||||
mode: keyful
|
||||
provider: "bouncycastle.ed25519"
|
||||
materialFormat: base64
|
||||
materialPath: "/etc/stellaops/keys/ed25519.key"
|
||||
certificateChain:
|
||||
- "-----BEGIN CERTIFICATE-----...-----END CERTIFICATE-----"
|
||||
rekor:
|
||||
primary:
|
||||
url: "https://rekor-v2.internal"
|
||||
proofTimeoutMs: 15000
|
||||
@@ -422,20 +515,20 @@ attestor:
|
||||
objectLock: "governance"
|
||||
redis:
|
||||
url: "redis://redis:6379/2"
|
||||
quotas:
|
||||
perCaller:
|
||||
qps: 50
|
||||
burst: 100
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
|
||||
* `signing.preferredProviders` defines the resolution order when multiple providers support the requested algorithm. Omit to fall back to registration order.
|
||||
* File-backed KMS (`signing.kms`) is required when at least one key uses `mode: kms`; the password should be injected via secret store or environment.
|
||||
* For keyful providers, supply inline `material` or `materialPath` plus `materialFormat` (`pem` (default), `base64`, or `hex`). KMS keys ignore these fields and require `kmsVersionId`.
|
||||
* `certificateChain` entries are appended to returned bundles so offline verifiers do not need to dereference external stores.
|
||||
|
||||
---
|
||||
quotas:
|
||||
perCaller:
|
||||
qps: 50
|
||||
burst: 100
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
|
||||
* `signing.preferredProviders` defines the resolution order when multiple providers support the requested algorithm. Omit to fall back to registration order.
|
||||
* File-backed KMS (`signing.kms`) is required when at least one key uses `mode: kms`; the password should be injected via secret store or environment.
|
||||
* For keyful providers, supply inline `material` or `materialPath` plus `materialFormat` (`pem` (default), `base64`, or `hex`). KMS keys ignore these fields and require `kmsVersionId`.
|
||||
* `certificateChain` entries are appended to returned bundles so offline verifiers do not need to dereference external stores.
|
||||
|
||||
---
|
||||
|
||||
## 10) End‑to‑end sequences
|
||||
|
||||
@@ -477,11 +570,11 @@ sequenceDiagram
|
||||
|
||||
---
|
||||
|
||||
## 11) Failure modes & responses
|
||||
|
||||
| Condition | Return | Details | | |
|
||||
| ------------------------------------- | ----------------------- | --------------------------------------------------------- | -------- | ------------ |
|
||||
| mTLS/OpTok invalid | `401 invalid_token` | Include `WWW-Authenticate` DPoP challenge when applicable | | |
|
||||
## 11) Failure modes & responses
|
||||
|
||||
| Condition | Return | Details | | |
|
||||
| ------------------------------------- | ----------------------- | --------------------------------------------------------- | -------- | ------------ |
|
||||
| mTLS/OpTok invalid | `401 invalid_token` | Include `WWW-Authenticate` DPoP challenge when applicable | | |
|
||||
| Bundle not signed by trusted identity | `403 chain_untrusted` | DSSE accepted only from Signer identities | | |
|
||||
| Duplicate bundle | `409 duplicate_bundle` | Return existing `uuid` (idempotent) | | |
|
||||
| Rekor unreachable/timeout | `502 rekor_unavailable` | Retry with backoff; surface `Retry-After` | | |
|
||||
@@ -529,14 +622,14 @@ sequenceDiagram
|
||||
|
||||
* **Dual‑log** write (primary + mirror) and **cross‑log proof** packaging.
|
||||
* **Cloud endorsement**: send `{uuid, artifactSha256}` to Stella Ops cloud; store returned endorsement id for marketing/chain‑of‑custody.
|
||||
* **Checkpoint pinning**: periodically pin latest Rekor checkpoints to an external audit store for independent monitoring.
|
||||
|
||||
---
|
||||
|
||||
## 16) Observability (stub)
|
||||
|
||||
- Runbook + dashboard placeholder for offline import: `operations/observability.md`, `operations/dashboards/attestor-observability.json`.
|
||||
- Metrics to surface: signing latency p95/p99, verification failure rate, transparency log submission lag, key rotation age, queue backlog, attestation bundle size histogram.
|
||||
- Health endpoints: `/health/liveness`, `/health/readiness`, `/status`; verification probe `/api/attestations/verify` once demo bundle is available (see runbook).
|
||||
- Alert hints: signing latency > 1s p99, verification failure spikes, tlog submission lag >10s, key rotation age over policy threshold, backlog above configured threshold.
|
||||
* **Checkpoint pinning**: periodically pin latest Rekor checkpoints to an external audit store for independent monitoring.
|
||||
|
||||
---
|
||||
|
||||
## 16) Observability (stub)
|
||||
|
||||
- Runbook + dashboard placeholder for offline import: `operations/observability.md`, `operations/dashboards/attestor-observability.json`.
|
||||
- Metrics to surface: signing latency p95/p99, verification failure rate, transparency log submission lag, key rotation age, queue backlog, attestation bundle size histogram.
|
||||
- Health endpoints: `/health/liveness`, `/health/readiness`, `/status`; verification probe `/api/attestations/verify` once demo bundle is available (see runbook).
|
||||
- Alert hints: signing latency > 1s p99, verification failure spikes, tlog submission lag >10s, key rotation age over policy threshold, backlog above configured threshold.
|
||||
|
||||
|
||||
215
docs/modules/attestor/proof-spine-algorithm.md
Normal file
215
docs/modules/attestor/proof-spine-algorithm.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Proof Spine Assembly Algorithm
|
||||
|
||||
> **Sprint:** SPRINT_0501_0004_0001
|
||||
> **Module:** Attestor / ProofChain
|
||||
|
||||
## Overview
|
||||
|
||||
The Proof Spine is the cryptographic backbone of StellaOps' proof chain. It aggregates evidence, reasoning, and VEX statements into a single merkle-rooted bundle that can be verified independently.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROOF SPINE STRUCTURE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ SBOMEntryID │ │ EvidenceID[] │ │ ReasoningID │ │ VEXVerdictID │ │
|
||||
│ │ (leaf 0) │ │ (leaves 1-N) │ │ (leaf N+1) │ │ (leaf N+2) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ └─────────────────┴─────────────────┴─────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────────────┐ │
|
||||
│ │ MERKLE TREE BUILDER │ │
|
||||
│ │ - SHA-256 hash function │ │
|
||||
│ │ - Lexicographic sorting │ │
|
||||
│ │ - Power-of-2 padding │ │
|
||||
│ └───────────────┬───────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────────────┐ │
|
||||
│ │ ProofBundleID (Root) │ │
|
||||
│ │ sha256:<64-hex-chars> │ │
|
||||
│ └───────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Algorithm Specification
|
||||
|
||||
### Input
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `sbomEntryId` | string | Content-addressed ID of the SBOM entry |
|
||||
| `evidenceIds` | string[] | Array of evidence statement IDs |
|
||||
| `reasoningId` | string | ID of the reasoning/policy match statement |
|
||||
| `vexVerdictId` | string | ID of the VEX verdict statement |
|
||||
|
||||
### Output
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `proofBundleId` | string | Merkle root in format `sha256:<64-hex>` |
|
||||
|
||||
### Pseudocode
|
||||
|
||||
```
|
||||
FUNCTION BuildProofBundleMerkle(sbomEntryId, evidenceIds[], reasoningId, vexVerdictId):
|
||||
|
||||
// Step 1: Prepare leaves in deterministic order
|
||||
leaves = []
|
||||
leaves.append(SHA256(UTF8.GetBytes(sbomEntryId)))
|
||||
|
||||
// Step 2: Sort evidence IDs lexicographically
|
||||
sortedEvidenceIds = evidenceIds.Sort(StringComparer.Ordinal)
|
||||
FOR EACH evidenceId IN sortedEvidenceIds:
|
||||
leaves.append(SHA256(UTF8.GetBytes(evidenceId)))
|
||||
|
||||
leaves.append(SHA256(UTF8.GetBytes(reasoningId)))
|
||||
leaves.append(SHA256(UTF8.GetBytes(vexVerdictId)))
|
||||
|
||||
// Step 3: Pad to power of 2 (duplicate last leaf)
|
||||
WHILE NOT IsPowerOfTwo(leaves.Length):
|
||||
leaves.append(leaves[leaves.Length - 1])
|
||||
|
||||
// Step 4: Build tree bottom-up
|
||||
currentLevel = leaves
|
||||
WHILE currentLevel.Length > 1:
|
||||
nextLevel = []
|
||||
FOR i = 0 TO currentLevel.Length STEP 2:
|
||||
left = currentLevel[i]
|
||||
right = currentLevel[i + 1]
|
||||
parent = SHA256(left || right) // Concatenate then hash
|
||||
nextLevel.append(parent)
|
||||
currentLevel = nextLevel
|
||||
|
||||
// Step 5: Return root as formatted ID
|
||||
RETURN "sha256:" + HexEncode(currentLevel[0])
|
||||
```
|
||||
|
||||
## Determinism Invariants
|
||||
|
||||
| Invariant | Rule | Rationale |
|
||||
|-----------|------|-----------|
|
||||
| Evidence Ordering | Lexicographic (byte comparison) | Reproducible across platforms |
|
||||
| Hash Function | SHA-256 only | No algorithm negotiation |
|
||||
| Padding | Duplicate last leaf | Not zeros, preserves tree structure |
|
||||
| Concatenation | Left `\|\|` Right | Consistent ordering |
|
||||
| String Encoding | UTF-8 | Cross-platform compatibility |
|
||||
| ID Format | `sha256:<lowercase-hex>` | Canonical representation |
|
||||
|
||||
## Example
|
||||
|
||||
### Input
|
||||
|
||||
```json
|
||||
{
|
||||
"sbomEntryId": "sha256:abc123...",
|
||||
"evidenceIds": [
|
||||
"sha256:evidence-cve-2024-0001...",
|
||||
"sha256:evidence-reachability...",
|
||||
"sha256:evidence-sbom-component..."
|
||||
],
|
||||
"reasoningId": "sha256:reasoning-policy...",
|
||||
"vexVerdictId": "sha256:vex-not-affected..."
|
||||
}
|
||||
```
|
||||
|
||||
### Processing
|
||||
|
||||
1. **Leaf 0**: `SHA256("sha256:abc123...")` → SBOM
|
||||
2. **Leaf 1**: `SHA256("sha256:evidence-cve-2024-0001...")` → Evidence (sorted first)
|
||||
3. **Leaf 2**: `SHA256("sha256:evidence-reachability...")` → Evidence
|
||||
4. **Leaf 3**: `SHA256("sha256:evidence-sbom-component...")` → Evidence
|
||||
5. **Leaf 4**: `SHA256("sha256:reasoning-policy...")` → Reasoning
|
||||
6. **Leaf 5**: `SHA256("sha256:vex-not-affected...")` → VEX
|
||||
7. **Padding**: Duplicate leaf 5 to get 8 leaves (power of 2)
|
||||
|
||||
### Tree Structure
|
||||
|
||||
```
|
||||
ROOT
|
||||
/ \
|
||||
H1 H2
|
||||
/ \ / \
|
||||
H3 H4 H5 H6
|
||||
/ \ / \ / \ / \
|
||||
L0 L1 L2 L3 L4 L5 L5 L5 (padded)
|
||||
```
|
||||
|
||||
### Output
|
||||
|
||||
```
|
||||
sha256:7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
|
||||
```
|
||||
|
||||
## Cross-Platform Verification
|
||||
|
||||
### Test Vector
|
||||
|
||||
For cross-platform compatibility testing, use this known test vector:
|
||||
|
||||
**Input:**
|
||||
```json
|
||||
{
|
||||
"sbomEntryId": "sha256:0000000000000000000000000000000000000000000000000000000000000001",
|
||||
"evidenceIds": [
|
||||
"sha256:0000000000000000000000000000000000000000000000000000000000000002",
|
||||
"sha256:0000000000000000000000000000000000000000000000000000000000000003"
|
||||
],
|
||||
"reasoningId": "sha256:0000000000000000000000000000000000000000000000000000000000000004",
|
||||
"vexVerdictId": "sha256:0000000000000000000000000000000000000000000000000000000000000005"
|
||||
}
|
||||
```
|
||||
|
||||
All implementations (C#, Go, Rust, TypeScript) must produce the same root hash.
|
||||
|
||||
## Verification
|
||||
|
||||
To verify a proof bundle:
|
||||
|
||||
1. Obtain all constituent statements (SBOM, Evidence, Reasoning, VEX)
|
||||
2. Extract their content-addressed IDs
|
||||
3. Re-compute the merkle root using the algorithm above
|
||||
4. Compare with the claimed `proofBundleId`
|
||||
|
||||
If the roots match, the bundle is valid and all statements are bound to this proof.
|
||||
|
||||
## API
|
||||
|
||||
### C# Interface
|
||||
|
||||
```csharp
|
||||
public interface IProofSpineAssembler
|
||||
{
|
||||
/// <summary>
|
||||
/// Assembles a proof spine from its constituent statements.
|
||||
/// </summary>
|
||||
ProofSpineResult Assemble(ProofSpineInput input);
|
||||
}
|
||||
|
||||
public record ProofSpineInput
|
||||
{
|
||||
public required string SbomEntryId { get; init; }
|
||||
public required IReadOnlyList<string> EvidenceIds { get; init; }
|
||||
public required string ReasoningId { get; init; }
|
||||
public required string VexVerdictId { get; init; }
|
||||
}
|
||||
|
||||
public record ProofSpineResult
|
||||
{
|
||||
public required string ProofBundleId { get; init; }
|
||||
public required byte[] MerkleRoot { get; init; }
|
||||
public required IReadOnlyList<byte[]> LeafHashes { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Proof and Evidence Chain Technical Reference](../product-advisories/14-Dec-2025%20-%20Proof%20and%20Evidence%20Chain%20Technical%20Reference.md) - §2.4, §4.2, §9
|
||||
- [Content-Addressed IDs](./content-addressed-ids.md)
|
||||
- [DSSE Predicates](./dsse-predicates.md)
|
||||
159
docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml
Normal file
159
docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml
Normal file
@@ -0,0 +1,159 @@
|
||||
# TTFS (Time to First Signal) Alert Rules
|
||||
# Reference: SPRINT_0341_0001_0001 Task T10
|
||||
# These alerts monitor SLOs for the TTFS experience
|
||||
|
||||
groups:
|
||||
- name: ttfs-slo
|
||||
interval: 30s
|
||||
rules:
|
||||
# Primary SLO: P95 latency must be under 5 seconds
|
||||
- alert: TtfsP95High
|
||||
expr: |
|
||||
histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface)) > 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: page
|
||||
component: ttfs
|
||||
slo: ttfs-latency
|
||||
annotations:
|
||||
summary: "TTFS P95 latency exceeds 5s for {{ $labels.surface }}"
|
||||
description: "Time to First Signal P95 is {{ $value | humanizeDuration }} for surface {{ $labels.surface }}. This breaches the TTFS SLO."
|
||||
runbook: "docs/runbooks/ttfs-latency-high.md"
|
||||
dashboard: "https://grafana.stellaops.local/d/ttfs-overview"
|
||||
|
||||
# Cache performance: Hit rate should be above 70%
|
||||
- alert: TtfsCacheHitRateLow
|
||||
expr: |
|
||||
sum(rate(ttfs_cache_hit_total[5m])) / sum(rate(ttfs_signal_total[5m])) < 0.7
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "TTFS cache hit rate below 70%"
|
||||
description: "Cache hit rate is {{ $value | humanizePercentage }}. Low cache hit rates increase TTFS latency."
|
||||
runbook: "docs/runbooks/ttfs-cache-performance.md"
|
||||
|
||||
# Error rate: Should be under 1%
|
||||
- alert: TtfsErrorRateHigh
|
||||
expr: |
|
||||
sum(rate(ttfs_error_total[5m])) / sum(rate(ttfs_signal_total[5m])) > 0.01
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "TTFS error rate exceeds 1%"
|
||||
description: "Error rate is {{ $value | humanizePercentage }}. Check logs for FirstSignalService errors."
|
||||
runbook: "docs/runbooks/ttfs-error-investigation.md"
|
||||
|
||||
# SLO breach counter: Too many breaches in a short window
|
||||
- alert: TtfsSloBreach
|
||||
expr: |
|
||||
sum(increase(ttfs_slo_breach_total[5m])) > 10
|
||||
for: 1m
|
||||
labels:
|
||||
severity: page
|
||||
component: ttfs
|
||||
slo: ttfs-breach-rate
|
||||
annotations:
|
||||
summary: "TTFS SLO breach rate high"
|
||||
description: "{{ $value }} SLO breaches in last 5 minutes. Immediate investigation required."
|
||||
runbook: "docs/runbooks/ttfs-slo-breach.md"
|
||||
|
||||
# Endpoint latency: HTTP endpoint should respond within 500ms
|
||||
- alert: FirstSignalEndpointLatencyHigh
|
||||
expr: |
|
||||
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{route=~"/api/v1/orchestrator/runs/.*/first-signal"}[5m])) by (le)) > 0.5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "First signal endpoint P95 latency > 500ms"
|
||||
description: "The /first-signal API endpoint P95 is {{ $value | humanizeDuration }}. This is the API-level latency only."
|
||||
runbook: "docs/runbooks/first-signal-api-slow.md"
|
||||
|
||||
- name: ttfs-availability
|
||||
interval: 1m
|
||||
rules:
|
||||
# Availability: First signal endpoint should be available
|
||||
- alert: FirstSignalEndpointDown
|
||||
expr: |
|
||||
up{job="orchestrator"} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "Orchestrator (First Signal provider) is down"
|
||||
description: "The Orchestrator service is not responding. First Signal functionality is unavailable."
|
||||
runbook: "docs/runbooks/orchestrator-down.md"
|
||||
|
||||
# No signals being generated
|
||||
- alert: TtfsNoSignals
|
||||
expr: |
|
||||
sum(rate(ttfs_signal_total[10m])) == 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "No TTFS signals generated in 15 minutes"
|
||||
description: "No First Signal events have been recorded. This could indicate no active runs or a metric collection issue."
|
||||
|
||||
- name: ttfs-ux
|
||||
interval: 1m
|
||||
rules:
|
||||
# UX: High bounce rate indicates poor experience
|
||||
- alert: TtfsBounceRateHigh
|
||||
expr: |
|
||||
sum(rate(ttfs_bounce_total[5m])) / sum(rate(ttfs_page_view_total[5m])) > 0.5
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
area: ux
|
||||
annotations:
|
||||
summary: "TTFS page bounce rate exceeds 50%"
|
||||
description: "More than 50% of users are leaving the run page within 10 seconds. This may indicate poor First Signal experience."
|
||||
|
||||
# UX: Long open-to-action time
|
||||
- alert: TtfsOpenToActionSlow
|
||||
expr: |
|
||||
histogram_quantile(0.75, sum(rate(ttfs_open_to_action_seconds_bucket[15m])) by (le)) > 30
|
||||
for: 1h
|
||||
labels:
|
||||
severity: info
|
||||
component: ttfs
|
||||
area: ux
|
||||
annotations:
|
||||
summary: "75% of users take >30s to first action"
|
||||
description: "Users are taking a long time to act on First Signal. Consider UX improvements."
|
||||
|
||||
- name: ttfs-failure-signatures
|
||||
interval: 30s
|
||||
rules:
|
||||
# New failure pattern emerging
|
||||
- alert: TtfsNewFailurePatternHigh
|
||||
expr: |
|
||||
sum(rate(ttfs_failure_signature_new_total[5m])) > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "High rate of new failure signatures"
|
||||
description: "New failure patterns are being detected at {{ $value }}/s. This may indicate a new class of errors."
|
||||
|
||||
# Failure signature confidence upgrades
|
||||
- alert: TtfsFailureSignatureConfidenceUpgrade
|
||||
expr: |
|
||||
sum(increase(ttfs_failure_signature_confidence_upgrade_total[1h])) > 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: info
|
||||
component: ttfs
|
||||
annotations:
|
||||
summary: "Multiple failure signatures upgraded to high confidence"
|
||||
description: "{{ $value }} failure signatures have been upgraded to high confidence in the last hour."
|
||||
@@ -0,0 +1,552 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": {
|
||||
"type": "datasource",
|
||||
"uid": "grafana"
|
||||
},
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"description": "Time to First Signal (TTFS) observability dashboard for StellaOps",
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"title": "TTFS P50/P95/P99 by Surface",
|
||||
"type": "timeseries",
|
||||
"gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 },
|
||||
"id": 1,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.50, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
|
||||
"legendFormat": "P50 - {{surface}}",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
|
||||
"legendFormat": "P95 - {{surface}}",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le, surface))",
|
||||
"legendFormat": "P99 - {{surface}}",
|
||||
"refId": "C"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "green" },
|
||||
{ "value": 2, "color": "yellow" },
|
||||
{ "value": 5, "color": "red" }
|
||||
]
|
||||
},
|
||||
"custom": {
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 10,
|
||||
"showPoints": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom",
|
||||
"calcs": ["mean", "max", "lastNotNull"]
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "desc"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Cache Hit Rate",
|
||||
"type": "stat",
|
||||
"gridPos": { "x": 12, "y": 0, "w": 6, "h": 4 },
|
||||
"id": 2,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(ttfs_cache_hit_total[5m])) / sum(rate(ttfs_signal_total[5m]))",
|
||||
"legendFormat": "Hit Rate",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percentunit",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "red" },
|
||||
{ "value": 0.7, "color": "yellow" },
|
||||
{ "value": 0.9, "color": "green" }
|
||||
]
|
||||
},
|
||||
"mappings": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": ["lastNotNull"],
|
||||
"fields": ""
|
||||
},
|
||||
"orientation": "auto",
|
||||
"textMode": "auto",
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto"
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "SLO Breaches (P95 > 5s)",
|
||||
"type": "stat",
|
||||
"gridPos": { "x": 18, "y": 0, "w": 6, "h": 4 },
|
||||
"id": 3,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(increase(ttfs_slo_breach_total[1h]))",
|
||||
"legendFormat": "Breaches (1h)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "green" },
|
||||
{ "value": 1, "color": "yellow" },
|
||||
{ "value": 10, "color": "red" }
|
||||
]
|
||||
},
|
||||
"mappings": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": ["lastNotNull"],
|
||||
"fields": ""
|
||||
},
|
||||
"orientation": "auto",
|
||||
"textMode": "auto",
|
||||
"colorMode": "background",
|
||||
"graphMode": "none",
|
||||
"justifyMode": "auto"
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Signal Source Distribution",
|
||||
"type": "piechart",
|
||||
"gridPos": { "x": 12, "y": 4, "w": 6, "h": 4 },
|
||||
"id": 4,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (signal_source) (rate(ttfs_signal_total[1h]))",
|
||||
"legendFormat": "{{signal_source}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"mappings": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
},
|
||||
"pieType": "pie",
|
||||
"tooltip": {
|
||||
"mode": "single"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Failure Signature Matches",
|
||||
"type": "stat",
|
||||
"gridPos": { "x": 18, "y": 4, "w": 6, "h": 4 },
|
||||
"id": 5,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(ttfs_failure_signature_match_total[5m]))",
|
||||
"legendFormat": "Matches/s",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "reqps",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "blue" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Signals by Kind",
|
||||
"type": "timeseries",
|
||||
"gridPos": { "x": 0, "y": 8, "w": 12, "h": 6 },
|
||||
"id": 6,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (kind) (rate(ttfs_signal_total[5m]))",
|
||||
"legendFormat": "{{kind}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "reqps",
|
||||
"custom": {
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 20,
|
||||
"stacking": {
|
||||
"mode": "normal",
|
||||
"group": "A"
|
||||
}
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Error Rate",
|
||||
"type": "timeseries",
|
||||
"gridPos": { "x": 12, "y": 8, "w": 12, "h": 6 },
|
||||
"id": 7,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(ttfs_error_total[5m])) / sum(rate(ttfs_signal_total[5m]))",
|
||||
"legendFormat": "Error Rate",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percentunit",
|
||||
"max": 0.1,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "green" },
|
||||
{ "value": 0.01, "color": "yellow" },
|
||||
{ "value": 0.05, "color": "red" }
|
||||
]
|
||||
},
|
||||
"custom": {
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 10
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "TTFS Latency Heatmap",
|
||||
"type": "heatmap",
|
||||
"gridPos": { "x": 0, "y": 14, "w": 12, "h": 8 },
|
||||
"id": 8,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(increase(ttfs_latency_seconds_bucket[1m])) by (le)",
|
||||
"legendFormat": "{{le}}",
|
||||
"format": "heatmap",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"calculate": false,
|
||||
"yAxis": {
|
||||
"axisPlacement": "left",
|
||||
"unit": "s"
|
||||
},
|
||||
"color": {
|
||||
"scheme": "Spectral",
|
||||
"mode": "scheme"
|
||||
},
|
||||
"cellGap": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "First Signal Endpoint Latency",
|
||||
"type": "timeseries",
|
||||
"gridPos": { "x": 12, "y": 14, "w": 12, "h": 8 },
|
||||
"id": 9,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
|
||||
"legendFormat": "P50",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
|
||||
"legendFormat": "P95",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{route=~\"/api/v1/orchestrator/runs/.*/first-signal\"}[5m])) by (le))",
|
||||
"legendFormat": "P99",
|
||||
"refId": "C"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "green" },
|
||||
{ "value": 0.3, "color": "yellow" },
|
||||
{ "value": 0.5, "color": "red" }
|
||||
]
|
||||
},
|
||||
"custom": {
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 10
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Open→Action Time Distribution",
|
||||
"type": "histogram",
|
||||
"gridPos": { "x": 0, "y": 22, "w": 8, "h": 6 },
|
||||
"id": 10,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(increase(ttfs_open_to_action_seconds_bucket[5m])) by (le)",
|
||||
"legendFormat": "{{le}}",
|
||||
"format": "heatmap",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Bounce Rate (< 10s)",
|
||||
"type": "stat",
|
||||
"gridPos": { "x": 8, "y": 22, "w": 4, "h": 6 },
|
||||
"id": 11,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(ttfs_bounce_total[5m])) / sum(rate(ttfs_page_view_total[5m]))",
|
||||
"legendFormat": "Bounce Rate",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percentunit",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{ "value": null, "color": "green" },
|
||||
{ "value": 0.3, "color": "yellow" },
|
||||
{ "value": 0.5, "color": "red" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Top Failure Signatures",
|
||||
"type": "table",
|
||||
"gridPos": { "x": 12, "y": 22, "w": 12, "h": 6 },
|
||||
"id": 12,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum by (error_token, error_code) (ttfs_failure_signature_hit_total))",
|
||||
"legendFormat": "{{error_token}} ({{error_code}})",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {
|
||||
"align": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": { "id": "byName", "options": "Value" },
|
||||
"properties": [
|
||||
{ "id": "displayName", "value": "Hit Count" }
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"excludeByName": {
|
||||
"Time": true
|
||||
},
|
||||
"renameByName": {
|
||||
"error_token": "Token",
|
||||
"error_code": "Code"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 38,
|
||||
"style": "dark",
|
||||
"tags": ["ttfs", "ux", "slo", "stellaops"],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"selected": false,
|
||||
"text": "Prometheus",
|
||||
"value": "prometheus"
|
||||
},
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Datasource",
|
||||
"multi": false,
|
||||
"name": "datasource",
|
||||
"options": [],
|
||||
"query": "prometheus",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"selected": true,
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${datasource}"
|
||||
},
|
||||
"definition": "label_values(ttfs_latency_seconds_bucket, surface)",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Surface",
|
||||
"multi": true,
|
||||
"name": "surface",
|
||||
"options": [],
|
||||
"query": {
|
||||
"query": "label_values(ttfs_latency_seconds_bucket, surface)",
|
||||
"refId": "PrometheusVariableQueryEditor-VariableQuery"
|
||||
},
|
||||
"refresh": 2,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {},
|
||||
"timezone": "utc",
|
||||
"title": "TTFS - Time to First Signal",
|
||||
"uid": "ttfs-overview",
|
||||
"version": 1,
|
||||
"weekStart": ""
|
||||
}
|
||||
@@ -361,7 +361,61 @@ export const TTFS_FIXTURES = {
|
||||
};
|
||||
```
|
||||
|
||||
## 12) References
|
||||
## 12) Observability
|
||||
|
||||
### 12.1 Grafana Dashboard
|
||||
|
||||
The TTFS observability dashboard provides real-time visibility into signal latency, cache performance, and SLO compliance.
|
||||
|
||||
- **Dashboard file**: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json`
|
||||
- **UID**: `ttfs-overview`
|
||||
|
||||
**Key panels:**
|
||||
- TTFS P50/P95/P99 by Surface (timeseries)
|
||||
- Cache Hit Rate (stat)
|
||||
- SLO Breaches (stat with threshold coloring)
|
||||
- Signal Source Distribution (piechart)
|
||||
- Signals by Kind (stacked timeseries)
|
||||
- Error Rate (timeseries)
|
||||
- TTFS Latency Heatmap
|
||||
- Top Failure Signatures (table)
|
||||
|
||||
### 12.2 Alert Rules
|
||||
|
||||
TTFS alerts are defined in `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`.
|
||||
|
||||
**Critical alerts:**
|
||||
| Alert | Threshold | For |
|
||||
|-------|-----------|-----|
|
||||
| `TtfsP95High` | P95 > 5s | 5m |
|
||||
| `TtfsSloBreach` | >10 breaches in 5m | 1m |
|
||||
| `FirstSignalEndpointDown` | Orchestrator unavailable | 2m |
|
||||
|
||||
**Warning alerts:**
|
||||
| Alert | Threshold | For |
|
||||
|-------|-----------|-----|
|
||||
| `TtfsCacheHitRateLow` | <70% | 10m |
|
||||
| `TtfsErrorRateHigh` | >1% | 5m |
|
||||
| `FirstSignalEndpointLatencyHigh` | P95 > 500ms | 5m |
|
||||
|
||||
### 12.3 Load Testing
|
||||
|
||||
Load tests validate TTFS performance under realistic conditions.
|
||||
|
||||
- **Test file**: `tests/load/ttfs-load-test.js`
|
||||
- **Framework**: k6
|
||||
|
||||
**Scenarios:**
|
||||
- Sustained: 50 RPS for 5 minutes
|
||||
- Spike: Ramp to 200 RPS
|
||||
- Soak: 25 RPS for 15 minutes
|
||||
|
||||
**Thresholds:**
|
||||
- Cache-hit P95 ≤ 250ms
|
||||
- Cold-path P95 ≤ 500ms
|
||||
- Error rate < 0.1%
|
||||
|
||||
## 13) References
|
||||
|
||||
- Advisory: `docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md`
|
||||
- Sprint 1 (Foundation): `docs/implplan/SPRINT_0338_0001_0001_ttfs_foundation.md`
|
||||
@@ -371,3 +425,6 @@ export const TTFS_FIXTURES = {
|
||||
- TTE Architecture: `docs/modules/telemetry/architecture.md`
|
||||
- Telemetry Schema: `docs/schemas/ttfs-event.schema.json`
|
||||
- Database Schema: `docs/db/schemas/ttfs.sql`
|
||||
- Grafana Dashboard: `docs/modules/telemetry/operations/dashboards/ttfs-observability.json`
|
||||
- Alert Rules: `docs/modules/telemetry/operations/alerts/ttfs-alerts.yaml`
|
||||
- Load Tests: `tests/load/ttfs-load-test.js`
|
||||
|
||||
Reference in New Issue
Block a user