up
This commit is contained in:
318
docs/modules/attestor/proof-chain-specification.md
Normal file
318
docs/modules/attestor/proof-chain-specification.md
Normal file
@@ -0,0 +1,318 @@
|
||||
# Proof and Evidence Chain Technical Specification
|
||||
|
||||
**Version**: 1.0
|
||||
**Status**: Implementation Ready
|
||||
**Source**: `docs/product-advisories/14-Dec-2025 - Proof and Evidence Chain Technical Reference.md`
|
||||
**Last Updated**: 2025-12-14
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This specification defines the implementation of a cryptographically verifiable proof chain that links SBOM components to VEX verdicts through signed evidence and reasoning statements. The system provides complete traceability from scan results to policy decisions with deterministic, auditable outputs.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROOF CHAIN ARCHITECTURE │
|
||||
├─────────────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Scanner │──►│ Evidence │──►│ Reasoning │──►│ VEX │ │
|
||||
│ │ SBOM │ │ Statements │ │ Statements │ │ Verdicts │ │
|
||||
│ │ (CycloneDX) │ │ │ │ │ │ │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONTENT-ADDRESSED IDs │ │
|
||||
│ │ SBOMEntryID | EvidenceID | ReasoningID | VEXVerdictID | ProofBundleID │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PROOF SPINE │ │
|
||||
│ │ Merkle aggregation: merkle_root(SBOMEntryID, EvidenceID[], ReasoningID, │ │
|
||||
│ │ VEXVerdictID) = ProofBundleID │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DSSE ENVELOPES │ │
|
||||
│ │ - In-toto Statement/v1 format │ │
|
||||
│ │ - Signed by role-specific keys │ │
|
||||
│ │ - Predicate types: evidence.stella/v1, reasoning.stella/v1, │ │
|
||||
│ │ cdx-vex.stella/v1, proofspine.stella/v1 │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ TRUST ANCHORS │ │
|
||||
│ │ - Per-dependency anchor (PURL pattern matching) │ │
|
||||
│ │ - Allowed key IDs for verification │ │
|
||||
│ │ - Key rotation with historical validity │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ REKOR TRANSPARENCY LOG │ │
|
||||
│ │ - Inclusion proofs for all DSSE envelopes │ │
|
||||
│ │ - Checkpoint verification │ │
|
||||
│ │ - Offline verification support │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Content-Addressed Identifier System
|
||||
|
||||
### ID Formats
|
||||
|
||||
| ID Type | Format | Example |
|
||||
|---------|--------|---------|
|
||||
| ArtifactID | `sha256:<64-hex>` | `sha256:a1b2c3d4e5f6...` |
|
||||
| SBOMEntryID | `<sbomDigest>:<purl>[@<version>]` | `sha256:91f2ab3c:pkg:npm/lodash@4.17.21` |
|
||||
| EvidenceID | `sha256:<hash(canonical_json)>` | `sha256:7b8c9d0e...` |
|
||||
| ReasoningID | `sha256:<hash(canonical_json)>` | `sha256:4a5b6c7d...` |
|
||||
| VEXVerdictID | `sha256:<hash(canonical_json)>` | `sha256:1f2e3d4c...` |
|
||||
| ProofBundleID | `sha256:<merkle_root>` | `sha256:9e8d7c6b...` |
|
||||
| GraphRevisionID | `grv_sha256:<hash>` | `grv_sha256:5a4b3c2d...` |
|
||||
| TrustAnchorID | `UUID v4` | `550e8400-e29b-41d4-a716-446655440000` |
|
||||
|
||||
### Canonicalization Rules
|
||||
|
||||
1. **UTF-8 encoding** for all strings
|
||||
2. **Sorted keys** (lexicographic, case-sensitive)
|
||||
3. **No insignificant whitespace**
|
||||
4. **No trailing commas**
|
||||
5. **Numbers in shortest form**
|
||||
6. **Deterministic array ordering** (by semantic key: bom-ref, purl)
|
||||
|
||||
## DSSE Predicate Types
|
||||
|
||||
### 1. Evidence Statement (`evidence.stella/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "evidence.stella/v1",
|
||||
"predicate": {
|
||||
"source": "scanner/feed name",
|
||||
"sourceVersion": "tool version",
|
||||
"collectionTime": "2025-12-14T00:00:00Z",
|
||||
"sbomEntryId": "<SBOMEntryID>",
|
||||
"vulnerabilityId": "CVE-XXXX-YYYY",
|
||||
"rawFinding": "<pointer or data>",
|
||||
"evidenceId": "<EvidenceID>"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: Scanner/Ingestor key
|
||||
|
||||
### 2. Reasoning Statement (`reasoning.stella/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "reasoning.stella/v1",
|
||||
"predicate": {
|
||||
"sbomEntryId": "<SBOMEntryID>",
|
||||
"evidenceIds": ["<EvidenceID>", ...],
|
||||
"policyVersion": "v2.3.1",
|
||||
"inputs": {
|
||||
"currentEvaluationTime": "2025-12-14T00:00:00Z",
|
||||
"severityThresholds": {...},
|
||||
"latticeRules": {...}
|
||||
},
|
||||
"intermediateFindings": {...},
|
||||
"reasoningId": "<ReasoningID>"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: Policy/Authority key
|
||||
|
||||
### 3. VEX Verdict Statement (`cdx-vex.stella/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "cdx-vex.stella/v1",
|
||||
"predicate": {
|
||||
"sbomEntryId": "<SBOMEntryID>",
|
||||
"vulnerabilityId": "CVE-XXXX-YYYY",
|
||||
"status": "not_affected|affected|fixed|under_investigation",
|
||||
"justification": "vulnerable_code_not_in_execute_path",
|
||||
"policyVersion": "v2.3.1",
|
||||
"reasoningId": "<ReasoningID>",
|
||||
"vexVerdictId": "<VEXVerdictID>"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: VEXer/Vendor key
|
||||
|
||||
### 4. Proof Spine Statement (`proofspine.stella/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "proofspine.stella/v1",
|
||||
"predicate": {
|
||||
"sbomEntryId": "<SBOMEntryID>",
|
||||
"evidenceIds": ["<ID1>", "<ID2>"],
|
||||
"reasoningId": "<ID>",
|
||||
"vexVerdictId": "<ID>",
|
||||
"policyVersion": "v2.3.1",
|
||||
"proofBundleId": "<ProofBundleID>"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: Authority key
|
||||
|
||||
### 5. Verdict Receipt Statement (`verdict.stella/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "verdict.stella/v1",
|
||||
"predicate": {
|
||||
"graphRevisionId": "<GraphRevisionID>",
|
||||
"findingKey": {"sbomEntryId": "<ID>", "vulnerabilityId": "CVE-..."},
|
||||
"rule": {"id": "POLICY-RULE-123", "version": "v2.3.1"},
|
||||
"decision": {"status": "block|warn|pass", "reason": "..."},
|
||||
"inputs": {"sbomDigest": "sha256:...", "feedsDigest": "sha256:...", "policyDigest": "sha256:..."},
|
||||
"outputs": {"proofBundleId": "<ID>", "reasoningId": "<ID>", "vexVerdictId": "<ID>"},
|
||||
"createdAt": "2025-12-14T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: Authority key
|
||||
|
||||
### 6. SBOM Linkage Statement (`sbom-linkage/v1`)
|
||||
```json
|
||||
{
|
||||
"predicateType": "https://stella-ops.org/predicates/sbom-linkage/v1",
|
||||
"predicate": {
|
||||
"sbom": {"id": "<sbomId>", "format": "CycloneDX", "specVersion": "1.6", ...},
|
||||
"generator": {"name": "StellaOps.Sbomer", "version": "x.y.z"},
|
||||
"generatedAt": "2025-12-14T00:00:00Z",
|
||||
"incompleteSubjects": [],
|
||||
"tags": {"tenantId": "...", "projectId": "...", "pipelineRunId": "..."}
|
||||
}
|
||||
}
|
||||
```
|
||||
**Signer**: Generator key
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Tables
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `proofchain.sbom_entries` | SBOM component entries with content-addressed IDs |
|
||||
| `proofchain.dsse_envelopes` | Signed DSSE envelopes by predicate type |
|
||||
| `proofchain.spines` | Proof spine aggregations linking evidence to verdicts |
|
||||
| `proofchain.trust_anchors` | Trust anchor configurations for verification |
|
||||
| `proofchain.rekor_entries` | Rekor transparency log entries |
|
||||
| `proofchain.key_history` | Key lifecycle history for rotation |
|
||||
| `proofchain.key_audit_log` | Audit log for key operations |
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/proofs/{entry}/spine` | POST | Create proof spine |
|
||||
| `/proofs/{entry}/receipt` | GET | Get verification receipt |
|
||||
| `/proofs/{entry}/vex` | GET | Get VEX document |
|
||||
| `/anchors/{anchor}` | GET/PUT | Trust anchor management |
|
||||
| `/anchors/{anchor}/keys` | GET/POST | Key management |
|
||||
| `/anchors/{anchor}/keys/{keyid}/revoke` | POST | Key revocation |
|
||||
| `/verify` | POST | Artifact verification |
|
||||
| `/keys/rotation-warnings` | GET | Rotation warnings |
|
||||
|
||||
## CLI Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success - no policy violations |
|
||||
| 1 | Policy violation detected |
|
||||
| 2 | System/scanner error |
|
||||
|
||||
## Verification Pipeline
|
||||
|
||||
The 13-step verification algorithm:
|
||||
|
||||
1. Resolve SBOMEntryID → TrustAnchorID
|
||||
2. Fetch spine and trust anchor
|
||||
3. Verify spine DSSE signature against TrustAnchor.allowedKeyids
|
||||
4. Verify VEX DSSE signature
|
||||
5. Verify reasoning DSSE signature
|
||||
6. Verify evidence DSSE signatures
|
||||
7. Recompute EvidenceIDs from stored canonical evidence
|
||||
8. Recompute ReasoningID from reasoning
|
||||
9. Recompute VEXVerdictID from VEX body
|
||||
10. Recompute ProofBundleID (merkle root) from above
|
||||
11. Compare all computed IDs to stored IDs
|
||||
12. If using Rekor: verify log inclusion proof
|
||||
13. Emit Receipt
|
||||
|
||||
## Key Rotation
|
||||
|
||||
### Process
|
||||
1. Add new key to TrustAnchor.allowedKeyids
|
||||
2. Transition period: both keys valid
|
||||
3. Optionally revoke old key (moves to revokedKeys)
|
||||
4. Publish key material via attestation feed
|
||||
|
||||
### Invariants
|
||||
- Never mutate old DSSE envelopes
|
||||
- Revoked keys remain valid for proofs signed before revocation
|
||||
- All key changes are audited
|
||||
|
||||
## Implementation Sprints
|
||||
|
||||
| Sprint | Focus | Status |
|
||||
|--------|-------|--------|
|
||||
| 0501.2 | Content-Addressed IDs & Core Records | TODO |
|
||||
| 0501.3 | New DSSE Predicate Types | TODO |
|
||||
| 0501.4 | Proof Spine Assembly | TODO |
|
||||
| 0501.5 | API Surface & Verification Pipeline | TODO |
|
||||
| 0501.6 | Database Schema Implementation | TODO |
|
||||
| 0501.7 | CLI Integration & Exit Codes | TODO |
|
||||
| 0501.8 | Key Rotation & Trust Anchors | TODO |
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [Master Sprint Plan](../../implplan/SPRINT_0501_0001_0001_proof_evidence_chain_master.md)
|
||||
- [Content-Addressed IDs Sprint](../../implplan/SPRINT_0501_0002_0001_proof_chain_content_addressed_ids.md)
|
||||
- [DSSE Predicates Sprint](../../implplan/SPRINT_0501_0003_0001_proof_chain_dsse_predicates.md)
|
||||
- [Proof Spine Assembly Sprint](../../implplan/SPRINT_0501_0004_0001_proof_chain_spine_assembly.md)
|
||||
- [API Surface Sprint](../../implplan/SPRINT_0501_0005_0001_proof_chain_api_surface.md)
|
||||
- [Database Schema Sprint](../../implplan/SPRINT_0501_0006_0001_proof_chain_database_schema.md)
|
||||
- [CLI Integration Sprint](../../implplan/SPRINT_0501_0007_0001_proof_chain_cli_integration.md)
|
||||
- [Key Rotation Sprint](../../implplan/SPRINT_0501_0008_0001_proof_chain_key_rotation.md)
|
||||
- [Attestor Architecture](./architecture.md)
|
||||
- [Signer Architecture](../signer/architecture.md)
|
||||
- [Database Specification](../../db/SPECIFICATION.md)
|
||||
|
||||
## Cryptographic Profiles
|
||||
|
||||
| Profile | Algorithm | Use Case |
|
||||
|---------|-----------|----------|
|
||||
| default | SHA256-ED25519 | General purpose |
|
||||
| fips | SHA256-ECDSA-P256 | FIPS 140-2/3 compliance |
|
||||
| gost | GOST R 34.10-2012 | Russian regulatory |
|
||||
| sm | SM2/SM3 | Chinese standards |
|
||||
| pqc | SHA256-DILITHIUM3 | Post-quantum |
|
||||
|
||||
## Determinism Constraints
|
||||
|
||||
### Non-Negotiable Invariants
|
||||
|
||||
1. **Immutability**: DSSE envelopes are append-only
|
||||
2. **Determinism**: Same inputs → same outputs
|
||||
3. **Traceability**: Every verdict traceable to evidence
|
||||
4. **Least Trust**: Explicit trust via TrustAnchors only
|
||||
5. **Backward Compatibility**: New code verifies old proofs
|
||||
|
||||
### Temporal Handling
|
||||
- UTC ISO-8601 only
|
||||
- No local time
|
||||
- Timestamps only when semantically required
|
||||
- Derivation from content preferred over wall-clock time
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0
|
||||
**Target Platform**: .NET 10, PostgreSQL ≥16, Angular v17
|
||||
691
docs/modules/attestor/rekor-verification-design.md
Normal file
691
docs/modules/attestor/rekor-verification-design.md
Normal file
@@ -0,0 +1,691 @@
|
||||
# Rekor Verification Technical Design
|
||||
|
||||
**Document ID**: DOCS-ATTEST-REKOR-001
|
||||
**Version**: 1.0
|
||||
**Last Updated**: 2025-12-14
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
||||
## 1. OVERVIEW
|
||||
|
||||
This document provides the comprehensive technical design for Rekor transparency log verification in StellaOps. It covers three key capabilities:
|
||||
|
||||
1. **Merkle Proof Verification** - Cryptographic verification of inclusion proofs
|
||||
2. **Durable Retry Queue** - Reliable submission with failure recovery
|
||||
3. **Time Skew Validation** - Replay protection via timestamp validation
|
||||
|
||||
### Related Sprints
|
||||
|
||||
| Sprint | Priority | Description |
|
||||
|--------|----------|-------------|
|
||||
| SPRINT_3000_0001_0001 | P0 | Merkle Proof Verification |
|
||||
| SPRINT_3000_0001_0002 | P1 | Rekor Retry Queue & Metrics |
|
||||
| SPRINT_3000_0001_0003 | P2 | Time Skew Validation |
|
||||
|
||||
---
|
||||
|
||||
## 2. ARCHITECTURE CONTEXT
|
||||
|
||||
### Current State
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Attestor Module │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ AttestorSubmission │───►│ IRekorClient │ │
|
||||
│ │ Service │ │ (HttpRekorClient) │ │
|
||||
│ └─────────────────────┘ └──────────┬──────────┘ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ IAttestorEntry │ │ Rekor API │ │
|
||||
│ │ Repository │ │ (External) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │
|
||||
│ Current Limitations: │
|
||||
│ ✗ Stores proofs but doesn't verify them cryptographically │
|
||||
│ ✗ Failed submissions are lost (no retry) │
|
||||
│ ✗ No integrated_time validation │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Target State
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Attestor Module (Enhanced) │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ AttestorSubmission │───►│ IRekorClient │ │
|
||||
│ │ Service │ │ + VerifyInclusion │◄──┐ │
|
||||
│ └─────────┬───────────┘ └──────────┬──────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ │ (on failure) │ │ │
|
||||
│ ▼ ▼ │ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │ │
|
||||
│ │ IRekorSubmission │ │ MerkleProofVerifier │ │ │
|
||||
│ │ Queue (PostgreSQL) │ │ CheckpointVerifier │ │ │
|
||||
│ └─────────┬───────────┘ └─────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ ▼ │ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │ │
|
||||
│ │ RekorRetryWorker │───►│ Rekor API │───┘ │
|
||||
│ │ (Background) │ │ (External) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ AttestorVerification│───►│ ITimeSkewValidator │ │
|
||||
│ │ Service │ │ (integrated_time) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │
|
||||
│ Enhancements: │
|
||||
│ ✓ Cryptographic Merkle proof verification │
|
||||
│ ✓ Durable retry queue with exponential backoff │
|
||||
│ ✓ Time skew detection and alerting │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. COMPONENT DESIGN
|
||||
|
||||
### 3.1 Merkle Proof Verification
|
||||
|
||||
#### 3.1.1 Algorithm
|
||||
|
||||
Rekor uses RFC 6962 (Certificate Transparency) Merkle tree structure:
|
||||
|
||||
```
|
||||
Root Hash
|
||||
/ \
|
||||
/ \
|
||||
Hash(0,1) Hash(2,3)
|
||||
/ \ / \
|
||||
H(0) H(1) H(2) H(3)
|
||||
│ │ │ │
|
||||
Leaf0 Leaf1 Leaf2 Leaf3
|
||||
```
|
||||
|
||||
**Leaf Hash Computation:**
|
||||
```
|
||||
leafHash = SHA256(0x00 || RFC6962_Entry)
|
||||
```
|
||||
|
||||
**Interior Node Computation:**
|
||||
```
|
||||
interiorHash = SHA256(0x01 || leftChild || rightChild)
|
||||
```
|
||||
|
||||
**Inclusion Proof Verification:**
|
||||
|
||||
Given:
|
||||
- `leafIndex`: Position of leaf in tree
|
||||
- `treeSize`: Total number of leaves
|
||||
- `proofPath[]`: Sibling hashes along path to root
|
||||
- `expectedRoot`: Root hash from checkpoint
|
||||
|
||||
```python
|
||||
def verify_inclusion(leaf_hash, leaf_index, tree_size, proof_path, expected_root):
|
||||
current_hash = leaf_hash
|
||||
current_index = leaf_index
|
||||
remaining_size = tree_size
|
||||
|
||||
for sibling_hash in proof_path:
|
||||
if current_index % 2 == 1:
|
||||
# Current node is right child
|
||||
current_hash = sha256(0x01 || sibling_hash || current_hash)
|
||||
else:
|
||||
# Current node is left child
|
||||
current_hash = sha256(0x01 || current_hash || sibling_hash)
|
||||
|
||||
current_index = current_index // 2
|
||||
remaining_size = (remaining_size + 1) // 2
|
||||
|
||||
return current_hash == expected_root
|
||||
```
|
||||
|
||||
#### 3.1.2 Checkpoint Verification
|
||||
|
||||
Rekor checkpoints are signed using the log's private key:
|
||||
|
||||
```
|
||||
Checkpoint Format:
|
||||
─────────────────
|
||||
rekor.sigstore.dev - 1234567
|
||||
<tree_size>
|
||||
<root_hash_base64>
|
||||
|
||||
— rekor.sigstore.dev <signature>
|
||||
```
|
||||
|
||||
Verification steps:
|
||||
1. Parse checkpoint text format
|
||||
2. Extract signature and public key hint
|
||||
3. Verify Ed25519/ECDSA signature over checkpoint body
|
||||
4. Extract root hash and tree size
|
||||
|
||||
#### 3.1.3 Implementation Classes
|
||||
|
||||
```csharp
|
||||
/// <summary>
|
||||
/// RFC 6962 Merkle proof verification.
|
||||
/// </summary>
|
||||
public static class MerkleProofVerifier
|
||||
{
|
||||
private static readonly byte LeafPrefix = 0x00;
|
||||
private static readonly byte NodePrefix = 0x01;
|
||||
|
||||
public static bool VerifyInclusion(
|
||||
byte[] leafHash,
|
||||
long leafIndex,
|
||||
long treeSize,
|
||||
IReadOnlyList<byte[]> proofPath,
|
||||
byte[] expectedRoot)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(leafHash);
|
||||
ArgumentNullException.ThrowIfNull(proofPath);
|
||||
ArgumentNullException.ThrowIfNull(expectedRoot);
|
||||
|
||||
if (leafHash.Length != 32 || expectedRoot.Length != 32)
|
||||
throw new ArgumentException("Hash must be 32 bytes (SHA-256)");
|
||||
|
||||
if (leafIndex < 0 || leafIndex >= treeSize)
|
||||
throw new ArgumentOutOfRangeException(nameof(leafIndex));
|
||||
|
||||
var currentHash = leafHash;
|
||||
var currentIndex = leafIndex;
|
||||
var currentSize = treeSize;
|
||||
|
||||
foreach (var siblingHash in proofPath)
|
||||
{
|
||||
if (siblingHash.Length != 32)
|
||||
throw new ArgumentException("Sibling hash must be 32 bytes");
|
||||
|
||||
if (currentIndex % 2 == 1)
|
||||
{
|
||||
// Current is right child
|
||||
currentHash = HashInterior(siblingHash, currentHash);
|
||||
}
|
||||
else
|
||||
{
|
||||
// Current is left child
|
||||
currentHash = HashInterior(currentHash, siblingHash);
|
||||
}
|
||||
|
||||
currentIndex /= 2;
|
||||
currentSize = (currentSize + 1) / 2;
|
||||
}
|
||||
|
||||
return currentHash.AsSpan().SequenceEqual(expectedRoot);
|
||||
}
|
||||
|
||||
private static byte[] HashInterior(byte[] left, byte[] right)
|
||||
{
|
||||
Span<byte> buffer = stackalloc byte[1 + 32 + 32];
|
||||
buffer[0] = NodePrefix;
|
||||
left.CopyTo(buffer.Slice(1, 32));
|
||||
right.CopyTo(buffer.Slice(33, 32));
|
||||
return SHA256.HashData(buffer);
|
||||
}
|
||||
|
||||
public static byte[] ComputeLeafHash(byte[] entryData)
|
||||
{
|
||||
Span<byte> buffer = stackalloc byte[1 + entryData.Length];
|
||||
buffer[0] = LeafPrefix;
|
||||
entryData.CopyTo(buffer.Slice(1));
|
||||
return SHA256.HashData(buffer);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Durable Retry Queue
|
||||
|
||||
#### 3.2.1 State Machine
|
||||
|
||||
```
|
||||
PENDING
|
||||
│
|
||||
│ Worker picks up
|
||||
▼
|
||||
SUBMITTING
|
||||
/ \
|
||||
/ \
|
||||
(success) (failure)
|
||||
/ \
|
||||
▼ ▼
|
||||
SUBMITTED RETRYING
|
||||
│
|
||||
│ (after backoff delay)
|
||||
▼
|
||||
SUBMITTING
|
||||
│
|
||||
│ (max attempts exceeded)
|
||||
▼
|
||||
DEAD_LETTER
|
||||
```
|
||||
|
||||
#### 3.2.2 Exponential Backoff
|
||||
|
||||
```csharp
|
||||
public static TimeSpan CalculateBackoff(int attemptCount, RekorQueueOptions options)
|
||||
{
|
||||
var delayMs = options.InitialDelayMs * Math.Pow(options.BackoffMultiplier, attemptCount - 1);
|
||||
var cappedDelayMs = Math.Min(delayMs, options.MaxDelayMs);
|
||||
|
||||
// Add jitter (±10%)
|
||||
var jitter = Random.Shared.NextDouble() * 0.2 - 0.1;
|
||||
var finalDelayMs = cappedDelayMs * (1 + jitter);
|
||||
|
||||
return TimeSpan.FromMilliseconds(finalDelayMs);
|
||||
}
|
||||
```
|
||||
|
||||
**Default backoff sequence:**
|
||||
| Attempt | Base Delay | With Jitter |
|
||||
|---------|------------|-------------|
|
||||
| 1 | 1s | 0.9s - 1.1s |
|
||||
| 2 | 2s | 1.8s - 2.2s |
|
||||
| 3 | 4s | 3.6s - 4.4s |
|
||||
| 4 | 8s | 7.2s - 8.8s |
|
||||
| 5 | 16s | 14.4s - 17.6s |
|
||||
|
||||
#### 3.2.3 Queue Table Design
|
||||
|
||||
```sql
|
||||
CREATE TABLE attestor_rekor_queue (
|
||||
id UUID PRIMARY KEY,
|
||||
tenant_id TEXT NOT NULL,
|
||||
bundle_sha256 TEXT NOT NULL UNIQUE, -- Idempotency key
|
||||
dsse_payload BYTEA NOT NULL,
|
||||
backend TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
attempt_count INT NOT NULL DEFAULT 0,
|
||||
max_attempts INT NOT NULL,
|
||||
last_attempt_at TIMESTAMPTZ,
|
||||
last_error TEXT,
|
||||
next_retry_at TIMESTAMPTZ,
|
||||
rekor_uuid TEXT, -- Set on success
|
||||
rekor_log_index BIGINT, -- Set on success
|
||||
created_at TIMESTAMPTZ NOT NULL,
|
||||
updated_at TIMESTAMPTZ NOT NULL
|
||||
);
|
||||
|
||||
-- Efficient dequeue query
|
||||
CREATE INDEX idx_rekor_queue_dequeue
|
||||
ON attestor_rekor_queue (next_retry_at, status)
|
||||
WHERE status IN ('pending', 'retrying');
|
||||
```
|
||||
|
||||
#### 3.2.4 Dequeue Query
|
||||
|
||||
```sql
|
||||
-- Atomic dequeue with row locking
|
||||
WITH eligible AS (
|
||||
SELECT id
|
||||
FROM attestor_rekor_queue
|
||||
WHERE status IN ('pending', 'retrying')
|
||||
AND (next_retry_at IS NULL OR next_retry_at <= NOW())
|
||||
ORDER BY next_retry_at NULLS FIRST, created_at
|
||||
LIMIT :batch_size
|
||||
FOR UPDATE SKIP LOCKED
|
||||
)
|
||||
UPDATE attestor_rekor_queue q
|
||||
SET status = 'submitting',
|
||||
updated_at = NOW()
|
||||
FROM eligible e
|
||||
WHERE q.id = e.id
|
||||
RETURNING q.*;
|
||||
```
|
||||
|
||||
### 3.3 Time Skew Validation
|
||||
|
||||
#### 3.3.1 Threat Model
|
||||
|
||||
| Attack | Description | Detection |
|
||||
|--------|-------------|-----------|
|
||||
| Backdated Entry | Attacker inserts entry with old timestamp | Large positive skew |
|
||||
| Future Timestamp | Attacker pre-dates entry | Negative skew (future) |
|
||||
| Log Manipulation | Attacker modifies existing entries | Timestamp inconsistency |
|
||||
|
||||
#### 3.3.2 Threshold Design
|
||||
|
||||
```
|
||||
Time Skew Detection Zones
|
||||
─────────────────────────────────────────────────────────────────
|
||||
│ REJECT │ WARN │ OK │ WARN │ REJECT │
|
||||
│ FUTURE │ FUTURE │ │ PAST │ PAST │
|
||||
─────────────────────────────────────────────────────────────────
|
||||
◄───────────────────────┼───────────────────────►
|
||||
-60s -5m NOW +5m +1h
|
||||
(local time)
|
||||
|
||||
Default Thresholds:
|
||||
• Future tolerance: 60 seconds (beyond = reject)
|
||||
• Warn threshold: 5 minutes
|
||||
• Reject threshold: 1 hour
|
||||
```
|
||||
|
||||
#### 3.3.3 Validation Flow
|
||||
|
||||
```csharp
|
||||
public TimeSkewResult Validate(DateTimeOffset integratedTime, DateTimeOffset localTime)
|
||||
{
|
||||
var skew = integratedTime - localTime;
|
||||
|
||||
// Future timestamps are highly suspicious
|
||||
if (skew > TimeSpan.Zero)
|
||||
{
|
||||
if (skew > _options.FutureTolerance)
|
||||
{
|
||||
return Rejected($"Future timestamp by {skew}");
|
||||
}
|
||||
return Ok(skew); // Within future tolerance
|
||||
}
|
||||
|
||||
// Past timestamps (normal case - Rekor time is in the past)
|
||||
var absSkew = skew.Duration();
|
||||
|
||||
if (absSkew >= _options.RejectThreshold)
|
||||
{
|
||||
return Rejected($"Time skew {absSkew} exceeds reject threshold");
|
||||
}
|
||||
|
||||
if (absSkew >= _options.WarnThreshold)
|
||||
{
|
||||
return Warning($"Time skew {absSkew} exceeds warn threshold");
|
||||
}
|
||||
|
||||
return Ok(skew);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. DATA FLOW
|
||||
|
||||
### 4.1 Submission Flow (with Queue)
|
||||
|
||||
```
|
||||
┌─────────┐ ┌──────────────────┐ ┌───────────┐
|
||||
│ Client │─────►│ SubmissionService │─────►│ RekorAPI │
|
||||
└─────────┘ └────────┬─────────┘ └─────┬─────┘
|
||||
│ │
|
||||
│ (success) │
|
||||
▼ │
|
||||
┌─────────────────┐ │
|
||||
│ Store Entry │◄─────────────┘
|
||||
│ (status=ok) │
|
||||
└─────────────────┘
|
||||
|
||||
│ (failure)
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Enqueue │
|
||||
│ (status=pending)│
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐ ┌───────────┐
|
||||
│ RetryWorker │─────►│ RekorAPI │
|
||||
│ (background) │ └─────┬─────┘
|
||||
└─────────────────┘ │
|
||||
▲ │
|
||||
│ │
|
||||
└─────────────────────┘
|
||||
(retry loop)
|
||||
```
|
||||
|
||||
### 4.2 Verification Flow (with Proof Verification)
|
||||
|
||||
```
|
||||
┌─────────┐ ┌───────────────────┐ ┌─────────────────┐
|
||||
│ Client │─────►│ VerificationSvc │─────►│ EntryRepository │
|
||||
└─────────┘ └────────┬──────────┘ └────────┬────────┘
|
||||
│ │
|
||||
│◄─────────────────────────┘
|
||||
│ (AttestorEntry)
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 1. TimeSkewValidator│
|
||||
│ (integrated_time)│
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 2. MerkleProof │
|
||||
│ Verifier │
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 3. Checkpoint │
|
||||
│ Verifier │
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ 4. Aggregate Result │
|
||||
│ (VerificationRpt)│
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. CONFIGURATION REFERENCE
|
||||
|
||||
```yaml
|
||||
# attestor.yaml
|
||||
|
||||
attestor:
|
||||
rekor:
|
||||
primary:
|
||||
url: https://rekor.sigstore.dev
|
||||
proof_timeout_ms: 15000
|
||||
poll_interval_ms: 250
|
||||
max_attempts: 60
|
||||
|
||||
mirror:
|
||||
enabled: false
|
||||
url: https://rekor-mirror.internal
|
||||
|
||||
verification:
|
||||
public_key_path: /etc/stellaops/rekor-pub.pem
|
||||
# Or inline:
|
||||
# public_key_base64: LS0tLS1CRUdJTi...
|
||||
allow_offline_without_signature: false
|
||||
max_checkpoint_age_minutes: 60
|
||||
|
||||
queue:
|
||||
enabled: true
|
||||
max_attempts: 5
|
||||
initial_delay_ms: 1000
|
||||
max_delay_ms: 60000
|
||||
backoff_multiplier: 2.0
|
||||
batch_size: 10
|
||||
poll_interval_ms: 5000
|
||||
dead_letter_retention_days: 30
|
||||
|
||||
time_skew:
|
||||
enabled: true
|
||||
warn_threshold_seconds: 300 # 5 minutes
|
||||
reject_threshold_seconds: 3600 # 1 hour
|
||||
future_tolerance_seconds: 60 # 1 minute
|
||||
reject_future_timestamps: true
|
||||
skip_in_offline_mode: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. METRICS REFERENCE
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `attestor.inclusion_verify_total` | Counter | `result` | Inclusion proof verifications |
|
||||
| `attestor.inclusion_verify_latency_seconds` | Histogram | | Verification latency |
|
||||
| `attestor.checkpoint_verify_total` | Counter | `result` | Checkpoint signature verifications |
|
||||
| `attestor.rekor_queue_depth` | Gauge | | Current pending + retrying items |
|
||||
| `attestor.rekor_retry_attempts_total` | Counter | `backend`, `attempt` | Retry attempts |
|
||||
| `attestor.rekor_submission_status_total` | Counter | `status`, `backend` | Submission outcomes |
|
||||
| `attestor.rekor_queue_wait_seconds` | Histogram | | Time in queue before submission |
|
||||
| `attestor.time_skew_detected_total` | Counter | `severity`, `action` | Time skew detections |
|
||||
| `attestor.time_skew_seconds` | Histogram | | Observed skew distribution |
|
||||
|
||||
---
|
||||
|
||||
## 7. ERROR HANDLING
|
||||
|
||||
### 7.1 Error Taxonomy
|
||||
|
||||
| Error Code | Description | Retry? | Action |
|
||||
|------------|-------------|--------|--------|
|
||||
| `rekor_unavailable` | Rekor API not reachable | Yes | Queue for retry |
|
||||
| `rekor_conflict` | Duplicate entry (409) | No | Retrieve existing entry |
|
||||
| `rekor_rate_limited` | Rate limit exceeded (429) | Yes | Backoff and retry |
|
||||
| `rekor_internal_error` | Server error (5xx) | Yes | Queue for retry |
|
||||
| `proof_invalid` | Merkle proof verification failed | No | Reject, log alert |
|
||||
| `checkpoint_signature_invalid` | Checkpoint signature failed | No | Reject, log alert |
|
||||
| `time_skew_rejected` | Time skew exceeds threshold | No | Reject, log warning |
|
||||
|
||||
### 7.2 Structured Logging
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-12-14T10:30:00Z",
|
||||
"level": "Warning",
|
||||
"message": "Rekor submission failed, queuing for retry",
|
||||
"error_code": "rekor_unavailable",
|
||||
"bundle_sha256": "abc123...",
|
||||
"backend": "primary",
|
||||
"attempt_count": 1,
|
||||
"next_retry_at": "2025-12-14T10:30:02Z",
|
||||
"error_message": "Connection refused"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. SECURITY CONSIDERATIONS
|
||||
|
||||
### 8.1 Key Management
|
||||
|
||||
- Rekor public key must be distributed out-of-band
|
||||
- Support key rotation via versioned key configuration
|
||||
- Store keys in secure location (not in code/config)
|
||||
|
||||
### 8.2 Trust Model
|
||||
|
||||
```
|
||||
Trust Hierarchy
|
||||
───────────────
|
||||
|
||||
┌─────────────┐
|
||||
│ Rekor Root │
|
||||
│ Public Key │
|
||||
└──────┬──────┘
|
||||
│ signs
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Checkpoint │
|
||||
│ (root hash) │
|
||||
└──────┬──────┘
|
||||
│ commits to
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Merkle Tree │
|
||||
│ (entries) │
|
||||
└──────┬──────┘
|
||||
│ includes
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Attestation │
|
||||
│ (DSSE) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### 8.3 Offline Security
|
||||
|
||||
In air-gapped environments:
|
||||
- Checkpoint must be pre-distributed with offline bundle
|
||||
- Proof verification still works (no network needed)
|
||||
- Time skew validation should be skipped or use bundled reference time
|
||||
|
||||
---
|
||||
|
||||
## 9. TESTING STRATEGY
|
||||
|
||||
### 9.1 Test Categories
|
||||
|
||||
| Category | Coverage Target | Tools |
|
||||
|----------|-----------------|-------|
|
||||
| Unit | >90% | xUnit, Moq |
|
||||
| Integration | >80% | Testcontainers (PostgreSQL) |
|
||||
| Contract | All public APIs | Snapshot testing |
|
||||
| Performance | Latency P99 | BenchmarkDotNet |
|
||||
|
||||
### 9.2 Golden Fixtures
|
||||
|
||||
Obtain from public Sigstore Rekor instance:
|
||||
|
||||
```bash
|
||||
# Get a real Rekor entry for testing
|
||||
rekor-cli get --uuid 24296fb24b8ad77a... --format json > fixtures/rekor-entry-valid.json
|
||||
|
||||
# Get checkpoint
|
||||
curl https://rekor.sigstore.dev/api/v1/log > fixtures/rekor-checkpoint.json
|
||||
|
||||
# Get public key
|
||||
curl https://rekor.sigstore.dev/api/v1/log/publicKey > fixtures/rekor-pubkey.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. MIGRATION GUIDE
|
||||
|
||||
### 10.1 Database Migrations
|
||||
|
||||
Run in order:
|
||||
1. `00X_rekor_submission_queue.sql` - Queue table
|
||||
2. Update `AttestorEntry` schema if stored in PostgreSQL
|
||||
|
||||
### 10.2 Configuration Migration
|
||||
|
||||
```yaml
|
||||
# Before (existing)
|
||||
attestor:
|
||||
rekor:
|
||||
primary:
|
||||
url: https://rekor.sigstore.dev
|
||||
|
||||
# After (add new sections)
|
||||
attestor:
|
||||
rekor:
|
||||
primary:
|
||||
url: https://rekor.sigstore.dev
|
||||
verification:
|
||||
public_key_path: /etc/stellaops/rekor-pub.pem
|
||||
queue:
|
||||
enabled: true
|
||||
time_skew:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
### 10.3 Rollback Plan
|
||||
|
||||
- Queue table can be dropped if not needed
|
||||
- All new features are configurable (can disable)
|
||||
- No breaking changes to existing API contracts
|
||||
|
||||
---
|
||||
|
||||
## 11. REFERENCES
|
||||
|
||||
- [RFC 6962: Certificate Transparency](https://datatracker.ietf.org/doc/html/rfc6962)
|
||||
- [Sigstore Rekor](https://github.com/sigstore/rekor)
|
||||
- [Transparency.dev Checkpoint Format](https://github.com/transparency-dev/formats)
|
||||
- [Advisory: Rekor Integration Technical Reference](../../../product-advisories/14-Dec-2025%20-%20Rekor%20Integration%20Technical%20Reference.md)
|
||||
373
docs/modules/telemetry/ttfs-architecture.md
Normal file
373
docs/modules/telemetry/ttfs-architecture.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Time-to-First-Signal (TTFS) Architecture
|
||||
|
||||
> Derived from Product Advisory (14-Dec-2025): UX and Time-to-Evidence Technical Reference; details the TTFS subsystem for providing immediate feedback on run/job status.
|
||||
|
||||
## 1) Overview
|
||||
|
||||
Time-to-First-Signal (TTFS) measures the latency from user action (opening a run, starting a scan, CLI invocation) to the first meaningful signal being displayed or logged. This architecture ensures users receive immediate feedback regardless of actual job completion time.
|
||||
|
||||
### 1.1 Design Goals
|
||||
|
||||
- **Instant Feedback:** P50 < 2s, P95 < 5s across all surfaces (UI, CLI, CI)
|
||||
- **Graceful Degradation:** Skeleton → Cached Signal → Live Data progression
|
||||
- **Offline-First:** Full functionality in air-gapped environments using PostgreSQL NOTIFY/LISTEN
|
||||
- **Predictive Context:** Provide "last known outcome" and ETA estimates for in-progress jobs
|
||||
|
||||
### 1.2 Signal Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ TTFS Signal Flow │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ User Action API Layer Cache Layer Data Layer │
|
||||
│ ─────────── ───────── ─────────── ────────── │
|
||||
│ │
|
||||
│ [Route Enter] ──┬──► /first-signal ───────► Valkey/Redis ─┐ │
|
||||
│ [CLI Start] ───┤ │ │ │ │
|
||||
│ [CI Job] ───┘ │ │ ▼ │
|
||||
│ │ │ ┌──────────────┐ │
|
||||
│ ▼ │ │ PostgreSQL │ │
|
||||
│ ┌──────────┐ │ │ first_signal │ │
|
||||
│ │ ETag │◄────────────────┤ │ _snapshots │ │
|
||||
│ │ Validation│ │ └──────────────┘ │
|
||||
│ └──────────┘ │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ Response Assembly │ │
|
||||
│ │ • kind (status indicator) │ │
|
||||
│ │ • phase (current stage) │ │
|
||||
│ │ • summary (human text) │ │
|
||||
│ │ • eta_seconds (estimate) │ │
|
||||
│ │ • last_known_outcome │ │
|
||||
│ │ • next_actions │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ SSE / Polling Client │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 2) Component Budgets
|
||||
|
||||
The 5-second P95 budget is allocated across components:
|
||||
|
||||
| Component | P50 Budget | P95 Budget | Notes |
|
||||
|-----------|------------|------------|-------|
|
||||
| Frontend (skeleton + hydration) | 100ms | 150ms | Network-independent |
|
||||
| Edge API (auth + routing) | 150ms | 250ms | JWT validation, rate limiting |
|
||||
| Core Services (lookup + assembly) | 700ms | 1,500ms | Cache hit vs cold path |
|
||||
| SSE/WebSocket establishment | — | 300ms | Fallback to polling if exceeded |
|
||||
| **Total (warm path)** | **700ms** | **2,500ms** | Cache hit scenario |
|
||||
| **Total (cold path)** | **1,200ms** | **4,000ms** | Cache miss, compute required |
|
||||
|
||||
## 3) Signal Kinds
|
||||
|
||||
The `kind` field indicates the current signal state:
|
||||
|
||||
| Kind | Description | Typical Duration | Icon |
|
||||
|------|-------------|------------------|------|
|
||||
| `queued` | Job waiting in queue | 0-30s | Queue |
|
||||
| `started` | Job has begun execution | — | Play |
|
||||
| `phase` | Job in specific phase | Varies | Progress |
|
||||
| `blocked` | Waiting on dependency/policy | — | Pause |
|
||||
| `failed` | Job has failed | — | Error |
|
||||
| `succeeded` | Job completed successfully | — | Check |
|
||||
| `canceled` | Job was canceled | — | Cancel |
|
||||
| `unavailable` | Signal cannot be determined | — | Unknown |
|
||||
|
||||
## 4) Signal Phases
|
||||
|
||||
The `phase` field indicates the current execution phase:
|
||||
|
||||
| Phase | Description | SLO Target |
|
||||
|-------|-------------|------------|
|
||||
| `resolve` | Dependency/artifact resolution | P95 < 30s |
|
||||
| `fetch` | Data retrieval (registry, advisories) | P95 < 45s |
|
||||
| `restore` | Cache/snapshot restoration | P95 < 10s |
|
||||
| `analyze` | Analysis execution (scan, policy) | P95 < 120s |
|
||||
| `policy` | Policy evaluation | P95 < 15s |
|
||||
| `report` | Report generation/upload | P95 < 30s |
|
||||
| `unknown` | Phase cannot be determined | — |
|
||||
|
||||
## 5) API Contracts
|
||||
|
||||
### 5.1 First Signal Endpoint
|
||||
|
||||
```http
|
||||
GET /api/v1/orchestrator/jobs/{jobId}/first-signal
|
||||
Accept: application/json
|
||||
If-None-Match: "{etag}"
|
||||
|
||||
200 OK
|
||||
ETag: "job-{id}-{updated_at.unix_ms}"
|
||||
Cache-Control: private, max-age=1, stale-while-revalidate=5
|
||||
X-Signal-Source: snapshot | cold_start | failure_index
|
||||
|
||||
{
|
||||
"kind": "started",
|
||||
"phase": "analyze",
|
||||
"summary": "Scanning image layers (47%)",
|
||||
"eta_seconds": 38,
|
||||
"last_known_outcome": {
|
||||
"status": "succeeded",
|
||||
"finished_at": "2025-12-13T10:15:00Z",
|
||||
"findings_count": 12
|
||||
},
|
||||
"next_actions": [
|
||||
{"label": "View previous run", "href": "/runs/abc-123"}
|
||||
],
|
||||
"diagnostics": {
|
||||
"queue_position": null,
|
||||
"worker_id": "worker-7"
|
||||
}
|
||||
}
|
||||
|
||||
304 Not Modified (if ETag matches)
|
||||
```
|
||||
|
||||
### 5.2 SSE Stream
|
||||
|
||||
```http
|
||||
GET /api/v1/orchestrator/stream/jobs/{jobId}/first-signal
|
||||
Accept: text/event-stream
|
||||
|
||||
event: signal
|
||||
data: {"kind":"started","phase":"analyze",...}
|
||||
|
||||
event: signal
|
||||
data: {"kind":"phase","phase":"policy",...}
|
||||
|
||||
event: done
|
||||
data: {"kind":"succeeded",...}
|
||||
```
|
||||
|
||||
### 5.3 CLI Integration
|
||||
|
||||
```bash
|
||||
# Job status with immediate signal
|
||||
stella job status <job-id> --watch
|
||||
|
||||
# Output progression:
|
||||
# [queued] Waiting in queue (position: 3)
|
||||
# [started] Job started on worker-7
|
||||
# [phase:analyze] Scanning image layers (47%)
|
||||
# [succeeded] Completed in 2m 34s
|
||||
```
|
||||
|
||||
## 6) Caching Strategy
|
||||
|
||||
### 6.1 Cache Tiers
|
||||
|
||||
| Tier | Storage | TTL | Use Case |
|
||||
|------|---------|-----|----------|
|
||||
| L1 | In-memory (per-instance) | 1s | Hot path, same-instance requests |
|
||||
| L2 | Valkey/Redis | 5s | Cross-instance, active jobs |
|
||||
| L3 | PostgreSQL | 24h | Persistent snapshots, air-gap mode |
|
||||
|
||||
### 6.2 Cache Keys
|
||||
|
||||
```
|
||||
ttfs:job:{tenant_id}:{job_id}:signal # Current signal
|
||||
ttfs:job:{tenant_id}:{job_id}:eta # ETA prediction
|
||||
ttfs:run:{tenant_id}:{run_id}:signals # Run-level aggregation
|
||||
ttfs:tenant:{tenant_id}:failure_sig # Failure signatures
|
||||
```
|
||||
|
||||
### 6.3 Air-Gap Mode
|
||||
|
||||
In air-gapped environments without Valkey/Redis:
|
||||
|
||||
1. **PostgreSQL NOTIFY/LISTEN** replaces pub/sub for real-time updates
|
||||
2. **Polling fallback** with 2-second intervals
|
||||
3. **first_signal_snapshots** table serves as L2 cache
|
||||
4. All SSE endpoints gracefully degrade to long-polling
|
||||
|
||||
## 7) Telemetry & Observability
|
||||
|
||||
### 7.1 Metrics
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `ttfs_latency_seconds` | Histogram | End-to-end signal latency |
|
||||
| `ttfs_cache_latency_seconds` | Histogram | Cache lookup time |
|
||||
| `ttfs_cold_latency_seconds` | Histogram | Cold path computation time |
|
||||
| `ttfs_signal_total` | Counter | Signals by kind/surface |
|
||||
| `ttfs_cache_hit_total` | Counter | Cache hits |
|
||||
| `ttfs_cache_miss_total` | Counter | Cache misses |
|
||||
| `ttfs_slo_breach_total` | Counter | SLO breaches |
|
||||
| `ttfs_error_total` | Counter | Errors by type |
|
||||
|
||||
### 7.2 Labels
|
||||
|
||||
All metrics include the following labels:
|
||||
|
||||
- `surface`: `ui` | `cli` | `ci`
|
||||
- `cache_hit`: `true` | `false`
|
||||
- `signal_source`: `snapshot` | `cold_start` | `failure_index`
|
||||
- `kind`: Signal kind enum
|
||||
- `tenant_id`: Tenant identifier (for multi-tenant deployments)
|
||||
|
||||
### 7.3 SLO Definitions
|
||||
|
||||
```yaml
|
||||
# Prometheus recording rules
|
||||
- record: ttfs:slo:p50_target
|
||||
expr: 2.0 # seconds
|
||||
|
||||
- record: ttfs:slo:p95_target
|
||||
expr: 5.0 # seconds
|
||||
|
||||
- record: ttfs:slo:compliance
|
||||
expr: |
|
||||
histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le))
|
||||
< 5.0
|
||||
|
||||
# Alerting rules
|
||||
- alert: TtfsSloBreachP95
|
||||
expr: histogram_quantile(0.95, sum(rate(ttfs_latency_seconds_bucket[5m])) by (le)) > 5.0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: page
|
||||
annotations:
|
||||
summary: "TTFS P95 exceeds 5s SLO"
|
||||
|
||||
- alert: TtfsHighErrorRate
|
||||
expr: rate(ttfs_error_total[5m]) > 0.1
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
```
|
||||
|
||||
## 8) Frontend Integration
|
||||
|
||||
### 8.1 Component Hierarchy
|
||||
|
||||
```
|
||||
FirstSignalCard (Smart Component)
|
||||
├── FirstSignalStore (Signal-based State)
|
||||
│ ├── SSE subscription
|
||||
│ ├── Polling fallback
|
||||
│ └── ETag caching
|
||||
├── StatusIndicator (Dumb Component)
|
||||
│ └── kind → icon + color mapping
|
||||
├── PhaseProgress (Dumb Component)
|
||||
│ └── phase → progress bar
|
||||
└── ActionButtons (Dumb Component)
|
||||
└── next_actions rendering
|
||||
```
|
||||
|
||||
### 8.2 State Machine
|
||||
|
||||
```typescript
|
||||
type FirstSignalLoadState = 'idle' | 'loading' | 'streaming' | 'error' | 'done';
|
||||
|
||||
// State transitions:
|
||||
// idle → loading (initial fetch)
|
||||
// loading → streaming (SSE connected) | error (fetch failed)
|
||||
// streaming → done (terminal signal) | error (connection lost)
|
||||
// error → loading (retry)
|
||||
```
|
||||
|
||||
### 8.3 Animation Tokens
|
||||
|
||||
| Token | Value | Usage |
|
||||
|-------|-------|-------|
|
||||
| `--motion-duration-quick` | 150ms | Skeleton fade, icon transitions |
|
||||
| `--motion-duration-normal` | 250ms | Card expansion, phase transitions |
|
||||
| `--motion-duration-slow` | 400ms | Success/failure celebrations |
|
||||
| `--motion-easing-standard` | cubic-bezier(0.4, 0, 0.2, 1) | Default easing |
|
||||
| `--motion-easing-decelerate` | cubic-bezier(0, 0, 0.2, 1) | Entries |
|
||||
| `--motion-easing-accelerate` | cubic-bezier(0.4, 0, 1, 1) | Exits |
|
||||
|
||||
## 9) Failure Signatures
|
||||
|
||||
Failure signatures enable predictive "last known outcome" by pattern-matching historical failures.
|
||||
|
||||
### 9.1 Signature Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"signature_hash": "sha256:abc123...",
|
||||
"pattern": {
|
||||
"phase": "analyze",
|
||||
"error_code": "LAYER_EXTRACT_FAILED",
|
||||
"image_pattern": "registry.io/.*:v1.*"
|
||||
},
|
||||
"outcome": {
|
||||
"likely_cause": "Registry rate limiting",
|
||||
"mttr_p50_seconds": 300,
|
||||
"suggested_action": "Wait 5 minutes and retry"
|
||||
},
|
||||
"confidence": 0.87,
|
||||
"sample_count": 42
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 Usage
|
||||
|
||||
When a job enters a known failure pattern:
|
||||
|
||||
1. **Match** current job state against `failure_signatures` table
|
||||
2. **Enrich** signal with `last_known_outcome.likely_cause`
|
||||
3. **Predict** ETA based on historical MTTR
|
||||
4. **Suggest** remediation via `next_actions`
|
||||
|
||||
## 10) Database Schema
|
||||
|
||||
See `docs/db/schemas/ttfs.sql` for the complete schema definition.
|
||||
|
||||
### 10.1 Core Tables
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `scheduler.first_signal_snapshots` | Cached signal state per job |
|
||||
| `scheduler.ttfs_events` | Telemetry event log |
|
||||
| `scheduler.failure_signatures` | Historical failure patterns |
|
||||
|
||||
### 10.2 Hourly Rollup View
|
||||
|
||||
The `scheduler.ttfs_hourly_summary` view provides pre-aggregated metrics for dashboard performance.
|
||||
|
||||
## 11) Testing Requirements
|
||||
|
||||
### 11.1 Unit Tests
|
||||
|
||||
- Signal store state machine transitions
|
||||
- ETag generation and validation
|
||||
- Cache hit/miss scenarios
|
||||
- Failure signature matching
|
||||
|
||||
### 11.2 Integration Tests
|
||||
|
||||
- End-to-end API latency measurement
|
||||
- SSE connection lifecycle
|
||||
- Air-gap mode fallback
|
||||
- Multi-tenant isolation
|
||||
|
||||
### 11.3 Deterministic Fixtures
|
||||
|
||||
```typescript
|
||||
// tests/fixtures/ttfs/
|
||||
export const TTFS_FIXTURES = {
|
||||
FROZEN_TIMESTAMP: '2025-12-04T12:00:00.000Z',
|
||||
DETERMINISTIC_SEED: 0x5EED2025,
|
||||
SAMPLE_JOB_ID: '550e8400-e29b-41d4-a716-446655440000',
|
||||
SAMPLE_TENANT_ID: 'tenant-test-001'
|
||||
};
|
||||
```
|
||||
|
||||
## 12) References
|
||||
|
||||
- Advisory: `docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md`
|
||||
- Sprint 1 (Foundation): `docs/implplan/SPRINT_0338_0001_0001_ttfs_foundation.md`
|
||||
- Sprint 2 (API): `docs/implplan/SPRINT_0339_0001_0001_first_signal_api.md`
|
||||
- Sprint 3 (UI): `docs/implplan/SPRINT_0340_0001_0001_first_signal_card_ui.md`
|
||||
- Sprint 4 (Enhancements): `docs/implplan/SPRINT_0341_0001_0001_ttfs_enhancements.md`
|
||||
- TTE Architecture: `docs/modules/telemetry/architecture.md`
|
||||
- Telemetry Schema: `docs/schemas/ttfs-event.schema.json`
|
||||
- Database Schema: `docs/db/schemas/ttfs.sql`
|
||||
@@ -53,8 +53,165 @@ This document maps StellaOps UI components to their interaction types and motion
|
||||
3. **Playwright check**: Snapshot tests run with `--disable-animations` and reduced-motion emulation
|
||||
4. **CI gate**: Perf budget failures block merge
|
||||
|
||||
---
|
||||
|
||||
## TTFS Component Interactions
|
||||
|
||||
> Per advisory §15: Component-to-interaction mapping for Time-to-First-Signal.
|
||||
|
||||
### FirstSignalCard Component
|
||||
|
||||
**Location:** `src/app/features/runs/components/first-signal-card/`
|
||||
|
||||
| Interaction | Trigger | Duration | Easing | Token |
|
||||
|-------------|---------|----------|--------|-------|
|
||||
| Skeleton shimmer | Initial load | 2s loop | linear | Custom keyframe |
|
||||
| Skeleton → Content | Signal loaded | 200ms | easeEntrance | `durationMd` |
|
||||
| Badge appear | Kind change | 140ms | easeEntrance | `durationSm` |
|
||||
| Progress update | Phase change | 140ms | easeStandard | `durationSm` |
|
||||
| Button hover | Mouse enter | 80ms | easeStandard | `durationXs` |
|
||||
| Button press | Click | 80ms | easeExit | `durationXs` |
|
||||
| Error shake | Load error | 260ms | easeStandard | `durationLg` |
|
||||
|
||||
**Telemetry Events:**
|
||||
|
||||
| Event | Trigger | Payload |
|
||||
|-------|---------|---------|
|
||||
| `ttfs_start` | Component mount | `{ runId, surface, t }` |
|
||||
| `ttfs_signal_rendered` | Signal displayed | `{ runId, surface, cacheHit, source, kind, ttfsMs }` |
|
||||
| `ui.click.action` | Action button click | `{ runId, actionType, target }` |
|
||||
| `ui.retry` | Retry button click | `{ runId, errorCode }` |
|
||||
|
||||
**Accessibility:**
|
||||
|
||||
| Requirement | Implementation |
|
||||
|-------------|----------------|
|
||||
| Focus visible | `:focus-visible` outline on buttons |
|
||||
| Screen reader | `aria-live="polite"` on content area |
|
||||
| Loading state | `aria-busy="true"` on card |
|
||||
| Status | `role="status"` on badge |
|
||||
| Keyboard | Tab navigation, Enter/Space activation |
|
||||
| Reduced motion | Disable shimmer, use instant transitions |
|
||||
|
||||
### EvidencePanel Component
|
||||
|
||||
**Location:** `src/app/features/evidence/evidence-panel.component.ts`
|
||||
|
||||
| Interaction | Trigger | Duration | Easing | Token |
|
||||
|-------------|---------|----------|--------|-------|
|
||||
| Tab switch | Tab click | 200ms | easeStandard | `durationMd` |
|
||||
| Panel expand | Accordion open | 260ms | easeEntrance | `durationLg` |
|
||||
| Panel collapse | Accordion close | 200ms | easeExit | `durationMd` |
|
||||
| Copy success | Copy button | 140ms | easeEntrance | `durationSm` |
|
||||
| Copy tooltip | After copy | 2s visible | — | — |
|
||||
| Export progress | Export start | Indeterminate | linear | Spinner |
|
||||
|
||||
**Telemetry Events:**
|
||||
|
||||
| Event | Trigger | Payload |
|
||||
|-------|---------|---------|
|
||||
| `finding_open` | Panel mount | `{ findingId, entryPoint, t }` |
|
||||
| `proof_rendered` | Proof visible | `{ findingId, proofKind, source, t }` |
|
||||
| `ui.copy.verify_command` | Copy verify cmd | `{ type: 'cosign'|'rekor'|'bundle' }` |
|
||||
| `evidence.export` | Export complete | `{ findingId, fileSize }` |
|
||||
|
||||
### ProofSpine Component
|
||||
|
||||
**Location:** `src/app/features/evidence/components/proof-spine/`
|
||||
|
||||
| Interaction | Trigger | Duration | Easing | Token |
|
||||
|-------------|---------|----------|--------|-------|
|
||||
| Hash truncation expand | Hover/focus | 140ms | easeStandard | `durationSm` |
|
||||
| Copy single | Copy button | 80ms | easeExit | `durationXs` |
|
||||
| Copy all | Copy all button | 140ms | easeEntrance | `durationSm` |
|
||||
| Status badge | Load | 200ms | easeEntrance | `durationMd` |
|
||||
|
||||
---
|
||||
|
||||
## TTFS Telemetry Event Catalog
|
||||
|
||||
### Core Events
|
||||
|
||||
| Event | Category | Sampling | Always Sample |
|
||||
|-------|----------|----------|---------------|
|
||||
| `ttfs_start` | ttfs | Per config | No |
|
||||
| `ttfs_signal_rendered` | ttfs | Per config | No |
|
||||
| `ttfs_timeout` | ttfs | — | Yes |
|
||||
| `ttfs_error` | ttfs | — | Yes |
|
||||
|
||||
### TTE Events
|
||||
|
||||
| Event | Category | Sampling | Always Sample |
|
||||
|-------|----------|----------|---------------|
|
||||
| `finding_open` | tte | Per config | No |
|
||||
| `proof_rendered` | tte | Per config | No |
|
||||
|
||||
### UX Events
|
||||
|
||||
| Event | Category | Sampling | Always Sample |
|
||||
|-------|----------|----------|---------------|
|
||||
| `ux.bounce` | ui | — | Yes |
|
||||
| `ux.non_bounce` | ui | Per config | No |
|
||||
| `ux.open_to_action` | ui | Per config | No |
|
||||
| `ui.copy.*` | ui | Per config | No |
|
||||
| `ui.click.*` | ui | Per config | No |
|
||||
| `evidence.export` | ui | Per config | No |
|
||||
|
||||
### Sampling Configuration
|
||||
|
||||
| Environment | TTFS Rate | TTE Rate | UI Rate |
|
||||
|-------------|-----------|----------|---------|
|
||||
| Development | 100% | 100% | 100% |
|
||||
| Staging | 100% | 100% | 100% |
|
||||
| Production | ≥25% | 50% | 25% |
|
||||
|
||||
---
|
||||
|
||||
## TTFS State Machine
|
||||
|
||||
```
|
||||
FirstSignalCard:
|
||||
idle → loading → loaded
|
||||
↘ unavailable
|
||||
↘ error → (retry) → loading
|
||||
|
||||
EvidencePanel:
|
||||
idle → loading → loaded
|
||||
↘ partial (some proofs failed)
|
||||
↘ error
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File Locations (TTFS)
|
||||
|
||||
| Deliverable | Path |
|
||||
|-------------|------|
|
||||
| FirstSignalCard | `src/Web/StellaOps.Web/src/app/features/runs/components/first-signal-card/` |
|
||||
| FirstSignalStore | `src/Web/StellaOps.Web/src/app/features/runs/stores/first-signal.store.ts` |
|
||||
| EvidencePanel | `src/Web/StellaOps.Web/src/app/features/evidence/evidence-panel.component.ts` |
|
||||
| ProofSpine | `src/Web/StellaOps.Web/src/app/features/evidence/components/proof-spine/` |
|
||||
| TTFS Telemetry | `src/Web/StellaOps.Web/src/app/core/telemetry/ttfs-telemetry.service.ts` |
|
||||
| Sampling Config | `src/Web/StellaOps.Web/src/app/core/telemetry/telemetry-config.ts` |
|
||||
| i18n Keys | `src/Web/StellaOps.Web/src/assets/i18n/en.json` |
|
||||
| Test Fixtures | `tests/fixtures/ttfs/ttfs-fixtures.ts` |
|
||||
| E2E Tests | `src/Web/StellaOps.Web/tests/e2e/micro-interactions.spec.ts` |
|
||||
|
||||
---
|
||||
|
||||
## Evidence
|
||||
|
||||
- Token catalog: `src/Web/StellaOps.Web/src/styles/tokens/_motion.scss`
|
||||
- TypeScript tokens: `src/Web/StellaOps.Web/src/app/styles/motion-tokens.ts`
|
||||
- Storybook stories: `src/Web/StellaOps.Web/src/stories/motion-tokens.stories.ts`
|
||||
- TTFS Architecture: `docs/modules/telemetry/ttfs-architecture.md`
|
||||
- Advisory: `docs/product-advisories/14-Dec-2025 - UX and Time-to-Evidence Technical Reference.md`
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Sprint 0338: TTFS Foundation
|
||||
- Sprint 0339: First Signal API
|
||||
- Sprint 0340: FirstSignalCard UI
|
||||
- Sprint 0341: TTFS Enhancements
|
||||
|
||||
Reference in New Issue
Block a user