save development progress

This commit is contained in:
StellaOps Bot
2025-12-25 23:09:58 +02:00
parent d71853ad7e
commit aa70af062e
351 changed files with 37683 additions and 150156 deletions

View File

@@ -198,14 +198,38 @@ sequenceDiagram
## Invalidation
> **See also**: [architecture.md](architecture.md#invalidation-mechanisms) for detailed invalidation flow diagrams.
### Automatic Invalidation Triggers
| Trigger | Event | Scope |
|---------|-------|-------|
| Signer Revocation | `SignerRevokedEvent` | All entries with matching `signer_set_hash` |
| Feed Epoch Advance | `FeedEpochAdvancedEvent` | Entries with older `feed_epoch` |
| Policy Update | `PolicyUpdatedEvent` | Entries with matching `policy_hash` |
| TTL Expiry | Background job | Entries past `expires_at` |
| Trigger | Event | Scope | Implementation |
|---------|-------|-------|----------------|
| Signer Revocation | `SignerRevokedEvent` | All entries with matching `signer_set_hash` | `SignerSetInvalidator` |
| Feed Epoch Advance | `FeedEpochAdvancedEvent` | Entries with older `feed_epoch` | `FeedEpochInvalidator` |
| Policy Update | `PolicyUpdatedEvent` | Entries with matching `policy_hash` | `PolicyHashInvalidator` |
| TTL Expiry | Background job | Entries past `expires_at` | `TtlExpirationService` |
### Invalidation Interfaces
```csharp
// Main invalidator interface
public interface IProvcacheInvalidator
{
Task<int> InvalidateAsync(
InvalidationCriteria criteria,
string reason,
string? correlationId = null,
CancellationToken cancellationToken = default);
}
// Revocation ledger for audit trail
public interface IRevocationLedger
{
Task RecordAsync(RevocationEntry entry, CancellationToken ct = default);
Task<IReadOnlyList<RevocationEntry>> GetEntriesSinceAsync(long sinceSeqNo, int limit = 1000, CancellationToken ct = default);
Task<RevocationLedgerStats> GetStatsAsync(CancellationToken ct = default);
}
```
### Manual Invalidation
@@ -227,8 +251,25 @@ POST /v1/provcache/invalidate
}
```
### Revocation Replay
Nodes can replay missed revocation events after restart or network partition:
```csharp
var replayService = services.GetRequiredService<IRevocationReplayService>();
var checkpoint = await replayService.GetCheckpointAsync();
var result = await replayService.ReplayFromAsync(
sinceSeqNo: checkpoint,
new RevocationReplayOptions { BatchSize = 1000 });
// result.EntriesReplayed, result.TotalInvalidations
```
## Air-Gap Integration
> **See also**: [architecture.md](architecture.md#air-gap-exportimport) for bundle format specification and architecture diagrams.
### Export Workflow
```bash
@@ -248,17 +289,56 @@ stella prov export --verikey sha256:abc123 --density strict --sign
# Import and verify Merkle root
stella prov import --input proof.bundle
# Import with lazy chunk fetch
# Import with lazy chunk fetch (connected mode)
stella prov import --input proof-lite.json --lazy-fetch --backend https://api.stellaops.com
# Import with lazy fetch from file directory (sneakernet mode)
stella prov import --input proof-lite.json --lazy-fetch --chunks-dir /mnt/usb/evidence
```
### Density Levels
| Level | Contents | Size | Use Case |
|-------|----------|------|----------|
| `lite` | DecisionDigest + ProofRoot | ~2 KB | Quick verification |
| `standard` | + First N chunks | ~200 KB | Normal audit |
| `strict` | + All chunks | Variable | Full compliance |
| Level | Contents | Size | Use Case | Lazy Fetch Support |
|-------|----------|------|----------|--------------------|
| `lite` | DecisionDigest + ProofRoot + Manifest | ~2 KB | Quick verification | Required |
| `standard` | + First N chunks (~10%) | ~200 KB | Normal audit | Partial (remaining chunks) |
| `strict` | + All chunks | Variable | Full compliance | Not needed |
### Lazy Evidence Fetching
For `lite` and `standard` density exports, missing chunks can be fetched on-demand:
```csharp
// HTTP fetcher (connected mode)
var httpFetcher = new HttpChunkFetcher(
new Uri("https://api.stellaops.com"), logger);
// File fetcher (air-gapped/sneakernet mode)
var fileFetcher = new FileChunkFetcher(
basePath: "/mnt/usb/evidence", logger);
// Orchestrate fetch + verify + store
var orchestrator = new LazyFetchOrchestrator(repository, logger);
var result = await orchestrator.FetchAndStoreAsync(
proofRoot: "sha256:...",
fetcher,
new LazyFetchOptions
{
VerifyOnFetch = true,
BatchSize = 100,
MaxChunks = 1000
});
```
### Sneakernet Export for Chunked Evidence
```csharp
// Export evidence chunks to file system for transport
await fileFetcher.ExportEvidenceChunksToFilesAsync(
manifest,
chunks,
outputDirectory: "/mnt/usb/evidence");
```
## Configuration
@@ -453,19 +533,30 @@ CREATE TABLE provcache.prov_evidence_chunks (
```sql
CREATE TABLE provcache.prov_revocations (
revocation_id UUID PRIMARY KEY,
revocation_type TEXT NOT NULL,
target_hash TEXT NOT NULL,
reason TEXT,
actor TEXT,
entries_affected BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL
seq_no BIGSERIAL PRIMARY KEY,
revocation_id UUID NOT NULL UNIQUE,
revocation_type VARCHAR(32) NOT NULL, -- signer, feed_epoch, policy, explicit, expiration
revoked_key VARCHAR(512) NOT NULL,
reason VARCHAR(1024),
entries_invalidated INTEGER NOT NULL,
source VARCHAR(128) NOT NULL,
correlation_id VARCHAR(128),
revoked_at TIMESTAMPTZ NOT NULL,
metadata JSONB,
CONSTRAINT chk_revocation_type CHECK (
revocation_type IN ('signer', 'feed_epoch', 'policy', 'explicit', 'expiration')
)
);
CREATE INDEX idx_revocations_type ON provcache.prov_revocations(revocation_type);
CREATE INDEX idx_revocations_key ON provcache.prov_revocations(revoked_key);
CREATE INDEX idx_revocations_time ON provcache.prov_revocations(revoked_at);
```
## Implementation Status
### Completed (Sprint 8200.0001.0001)
### Completed (Sprint 8200.0001.0001 - Core Backend)
| Component | Path | Status |
|-----------|------|--------|
@@ -477,18 +568,39 @@ CREATE TABLE provcache.prov_revocations (
| API Endpoints | `src/__Libraries/StellaOps.Provcache.Api/` | ✅ Done |
| Unit Tests (53) | `src/__Libraries/__Tests/StellaOps.Provcache.Tests/` | ✅ Done |
### Completed (Sprint 8200.0001.0002 - Invalidation & Air-Gap)
| Component | Path | Status |
|-----------|------|--------|
| Invalidation Interfaces | `src/__Libraries/StellaOps.Provcache/Invalidation/` | ✅ Done |
| Repository Invalidation Methods | `IEvidenceChunkRepository.Delete*Async()` | ✅ Done |
| Export Interfaces | `src/__Libraries/StellaOps.Provcache/Export/` | ✅ Done |
| IMinimalProofExporter | `Export/IMinimalProofExporter.cs` | ✅ Done |
| MinimalProofExporter | `Export/MinimalProofExporter.cs` | ✅ Done |
| Lazy Fetch - ILazyEvidenceFetcher | `LazyFetch/ILazyEvidenceFetcher.cs` | ✅ Done |
| Lazy Fetch - HttpChunkFetcher | `LazyFetch/HttpChunkFetcher.cs` | ✅ Done |
| Lazy Fetch - FileChunkFetcher | `LazyFetch/FileChunkFetcher.cs` | ✅ Done |
| Lazy Fetch - LazyFetchOrchestrator | `LazyFetch/LazyFetchOrchestrator.cs` | ✅ Done |
| Revocation - IRevocationLedger | `Revocation/IRevocationLedger.cs` | ✅ Done |
| Revocation - InMemoryRevocationLedger | `Revocation/InMemoryRevocationLedger.cs` | ✅ Done |
| Revocation - RevocationReplayService | `Revocation/RevocationReplayService.cs` | ✅ Done |
| ProvRevocationEntity | `Entities/ProvRevocationEntity.cs` | ✅ Done |
| Unit Tests (124 total) | `src/__Libraries/__Tests/StellaOps.Provcache.Tests/` | ✅ Done |
### Blocked
| Component | Reason |
|-----------|--------|
| Policy Engine Integration | `PolicyEvaluator` is `internal sealed`; requires architectural review to expose injection points for `IProvcacheService` |
| CLI e2e Tests | `AddSimRemoteCryptoProvider` method missing in CLI codebase |
### Pending
| Component | Sprint |
|-----------|--------|
| Signer Revocation Events | 8200.0001.0002 |
| CLI Export/Import | 8200.0001.0002 |
| Authority Event Integration | 8200.0001.0002 (BLOCKED - Authority needs event publishing) |
| Concelier Event Integration | 8200.0001.0002 (BLOCKED - Concelier needs event publishing) |
| PostgresRevocationLedger | Future (requires EF Core integration) |
| UI Badges & Proof Tree | 8200.0001.0003 |
| Grafana Dashboards | 8200.0001.0003 |
@@ -502,6 +614,7 @@ CREATE TABLE provcache.prov_revocations (
## Related Documentation
- **[Provcache Architecture Guide](architecture.md)** - Detailed architecture, invalidation flows, and API reference
- [Policy Engine Architecture](../policy/README.md)
- [TrustLattice Engine](../policy/design/policy-deterministic-evaluator.md)
- [Offline Kit Documentation](../../24_OFFLINE_KIT.md)

View File

@@ -0,0 +1,438 @@
# Provcache Architecture Guide
> Detailed architecture documentation for the Provenance Cache module
## Overview
Provcache provides a caching layer that maximizes "provenance density" — the amount of trustworthy evidence retained per byte. This document covers the internal architecture, invalidation mechanisms, air-gap support, and replay capabilities.
## Table of Contents
1. [Cache Architecture](#cache-architecture)
2. [Invalidation Mechanisms](#invalidation-mechanisms)
3. [Evidence Chunk Storage](#evidence-chunk-storage)
4. [Air-Gap Export/Import](#air-gap-exportimport)
5. [Lazy Evidence Fetching](#lazy-evidence-fetching)
6. [Revocation Ledger](#revocation-ledger)
7. [API Reference](#api-reference)
---
## Cache Architecture
### Storage Layers
```
┌───────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ VeriKey │───▶│ Provcache │───▶│ Policy Engine │ │
│ │ Builder │ │ Service │ │ (cache miss) │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
└───────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
│ Caching Layer │
│ ┌─────────────────┐ ┌──────────────────────────┐ │
│ │ Valkey │◀───────▶│ PostgreSQL │ │
│ │ (read-through) │ │ (write-behind queue) │ │
│ │ │ │ │ │
│ │ • Hot cache │ │ • provcache_items │ │
│ │ • Sub-ms reads │ │ • prov_evidence_chunks │ │
│ │ • TTL-based │ │ • prov_revocations │ │
│ └─────────────────┘ └──────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
```
### Key Components
| Component | Purpose |
|-----------|---------|
| `IProvcacheService` | Main service interface for cache operations |
| `IProvcacheStore` | Storage abstraction (Valkey + Postgres) |
| `WriteBehindQueue` | Async persistence to Postgres |
| `IEvidenceChunker` | Splits large evidence into Merkle-verified chunks |
| `IRevocationLedger` | Audit trail for all invalidation events |
---
## Invalidation Mechanisms
Provcache supports multiple invalidation triggers to ensure cache consistency when upstream data changes.
### Automatic Invalidation
#### 1. Signer Revocation
When a signing key is compromised or rotated:
```
┌─────────────┐ SignerRevokedEvent ┌──────────────────┐
│ Authority │ ──────────────────────────▶│ SignerSet │
│ Module │ │ Invalidator │
└─────────────┘ └────────┬─────────┘
DELETE FROM provcache_items
WHERE signer_set_hash = ?
```
**Implementation**: `SignerSetInvalidator` subscribes to `SignerRevokedEvent` and invalidates all entries signed by the revoked key.
#### 2. Feed Epoch Advancement
When vulnerability feeds are updated:
```
┌─────────────┐ FeedEpochAdvancedEvent ┌──────────────────┐
│ Concelier │ ───────────────────────────▶│ FeedEpoch │
│ Module │ │ Invalidator │
└─────────────┘ └────────┬─────────┘
DELETE FROM provcache_items
WHERE feed_epoch < ?
```
**Implementation**: `FeedEpochInvalidator` compares epochs using semantic versioning or ISO timestamps.
#### 3. Policy Updates
When policy bundles change:
```
┌─────────────┐ PolicyUpdatedEvent ┌──────────────────┐
│ Policy │ ───────────────────────────▶│ PolicyHash │
│ Engine │ │ Invalidator │
└─────────────┘ └────────┬─────────┘
DELETE FROM provcache_items
WHERE policy_hash = ?
```
### Invalidation Recording
All invalidation events are recorded in the revocation ledger for audit and replay:
```csharp
public interface IProvcacheInvalidator
{
Task<int> InvalidateAsync(
InvalidationCriteria criteria,
string reason,
string? correlationId = null,
CancellationToken cancellationToken = default);
}
```
The ledger entry includes:
- Revocation type (signer, feed_epoch, policy, explicit)
- The revoked key
- Number of entries invalidated
- Timestamp and correlation ID for tracing
---
## Evidence Chunk Storage
Large evidence (SBOMs, VEX documents, call graphs) is stored in fixed-size chunks with Merkle tree verification.
### Chunking Process
```
┌─────────────────────────────────────────────────────────────────┐
│ Original Evidence │
│ [ 2.3 MB SPDX SBOM JSON ] │
└─────────────────────────────────────────────────────────────────┘
▼ IEvidenceChunker.ChunkAsync()
┌─────────────────────────────────────────────────────────────────┐
│ Chunk 0 (64KB) │ Chunk 1 (64KB) │ ... │ Chunk N (partial) │
│ hash: abc123 │ hash: def456 │ │ hash: xyz789 │
└─────────────────────────────────────────────────────────────────┘
▼ Merkle tree construction
┌─────────────────────────────────────────────────────────────────┐
│ Proof Root │
│ sha256:merkle_root_of_all_chunks │
└─────────────────────────────────────────────────────────────────┘
```
### Database Schema
```sql
CREATE TABLE provcache.prov_evidence_chunks (
chunk_id UUID PRIMARY KEY,
proof_root VARCHAR(128) NOT NULL,
chunk_index INTEGER NOT NULL,
chunk_hash VARCHAR(128) NOT NULL,
blob BYTEA NOT NULL,
blob_size INTEGER NOT NULL,
content_type VARCHAR(64) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT uk_proof_chunk UNIQUE (proof_root, chunk_index)
);
CREATE INDEX idx_evidence_proof_root ON provcache.prov_evidence_chunks(proof_root);
```
### Paging API
Evidence can be retrieved in pages to manage memory:
```http
GET /api/v1/proofs/{proofRoot}?page=0&pageSize=10
```
Response includes chunk metadata without blob data, allowing clients to fetch specific chunks on demand.
---
## Air-Gap Export/Import
Provcache supports air-gapped environments through minimal proof bundles.
### Bundle Format (v1)
```json
{
"version": "v1",
"exportedAt": "2025-01-15T10:30:00Z",
"density": "standard",
"digest": {
"veriKey": "sha256:...",
"verdictHash": "sha256:...",
"proofRoot": "sha256:...",
"trustScore": 85
},
"manifest": {
"proofRoot": "sha256:...",
"totalChunks": 42,
"totalSize": 2752512,
"chunks": [...]
},
"chunks": [
{
"index": 0,
"data": "base64...",
"hash": "sha256:..."
}
],
"signature": {
"algorithm": "ECDSA-P256",
"signature": "base64...",
"signedAt": "2025-01-15T10:30:01Z"
}
}
```
### Density Levels
| Level | Contents | Typical Size | Use Case |
|-------|----------|--------------|----------|
| **Lite** | Digest + ProofRoot + Manifest | ~2 KB | Quick verification, requires lazy fetch for full evidence |
| **Standard** | + First 10% of chunks | ~200 KB | Normal audits, balance of size vs completeness |
| **Strict** | + All chunks | Variable | Full compliance, no network needed |
### Export Example
```csharp
var exporter = serviceProvider.GetRequiredService<IMinimalProofExporter>();
// Lite export (manifest only)
var liteBundle = await exporter.ExportAsync(
veriKey: "sha256:abc123",
new MinimalProofExportOptions { Density = ProofDensity.Lite });
// Signed strict export
var strictBundle = await exporter.ExportAsync(
veriKey: "sha256:abc123",
new MinimalProofExportOptions
{
Density = ProofDensity.Strict,
SignBundle = true,
Signer = signerInstance
});
```
### Import and Verification
```csharp
var result = await exporter.ImportAsync(bundle);
if (result.DigestVerified && result.ChunksVerified)
{
// Bundle is authentic
await provcache.UpsertAsync(result.Entry);
}
```
---
## Lazy Evidence Fetching
For lite bundles, missing chunks can be fetched on-demand from connected or file sources.
### Fetcher Architecture
```
┌────────────────────┐
│ ILazyEvidenceFetcher│
└─────────┬──────────┘
┌─────┴─────┐
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ HTTP │ │ File │
│ Fetcher │ │ Fetcher │
└─────────┘ └──────────┘
```
### HTTP Fetcher (Connected Mode)
```csharp
var fetcher = new HttpChunkFetcher(
new Uri("https://api.stellaops.com"),
logger);
var orchestrator = new LazyFetchOrchestrator(repository, logger);
var result = await orchestrator.FetchAndStoreAsync(
proofRoot: "sha256:...",
fetcher,
new LazyFetchOptions
{
VerifyOnFetch = true,
BatchSize = 100
});
```
### File Fetcher (Sneakernet Mode)
For fully air-gapped environments:
1. Export full evidence to USB drive
2. Transport to isolated network
3. Import using file fetcher
```csharp
var fetcher = new FileChunkFetcher(
basePath: "/mnt/usb/evidence",
logger);
var result = await orchestrator.FetchAndStoreAsync(proofRoot, fetcher);
```
---
## Revocation Ledger
The revocation ledger provides a complete audit trail of all invalidation events.
### Schema
```sql
CREATE TABLE provcache.prov_revocations (
seq_no BIGSERIAL PRIMARY KEY,
revocation_id UUID NOT NULL,
revocation_type VARCHAR(32) NOT NULL,
revoked_key VARCHAR(512) NOT NULL,
reason VARCHAR(1024),
entries_invalidated INTEGER NOT NULL,
source VARCHAR(128) NOT NULL,
correlation_id VARCHAR(128),
revoked_at TIMESTAMPTZ NOT NULL,
metadata JSONB
);
```
### Replay for Catch-Up
After node restart or network partition, nodes can replay missed revocations:
```csharp
var replayService = serviceProvider.GetRequiredService<IRevocationReplayService>();
// Get last checkpoint
var checkpoint = await replayService.GetCheckpointAsync();
// Replay from checkpoint
var result = await replayService.ReplayFromAsync(
sinceSeqNo: checkpoint,
new RevocationReplayOptions
{
BatchSize = 1000,
SaveCheckpointPerBatch = true
});
Console.WriteLine($"Replayed {result.EntriesReplayed} revocations, {result.TotalInvalidations} entries invalidated");
```
### Statistics
```csharp
var ledger = serviceProvider.GetRequiredService<IRevocationLedger>();
var stats = await ledger.GetStatsAsync();
// stats.TotalEntries - total revocation events
// stats.EntriesByType - breakdown by type (signer, feed_epoch, etc.)
// stats.TotalEntriesInvalidated - sum of all invalidated cache entries
```
---
## API Reference
### Evidence Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/proofs/{proofRoot}` | GET | Get paged evidence chunks |
| `/api/v1/proofs/{proofRoot}/manifest` | GET | Get chunk manifest |
| `/api/v1/proofs/{proofRoot}/chunks/{index}` | GET | Get specific chunk |
| `/api/v1/proofs/{proofRoot}/verify` | POST | Verify Merkle proof |
### Invalidation Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/provcache/invalidate` | POST | Manual invalidation |
| `/api/v1/provcache/revocations` | GET | List revocation history |
| `/api/v1/provcache/stats` | GET | Cache statistics |
### CLI Commands
```bash
# Export commands
stella prov export --verikey <key> --density <lite|standard|strict> [--output <file>] [--sign]
# Import commands
stella prov import <file> [--lazy-fetch] [--backend <url>] [--chunks-dir <path>]
# Verify commands
stella prov verify <file> [--signer-cert <cert>]
```
---
## Configuration
Key settings in `appsettings.json`:
```json
{
"Provcache": {
"ChunkSize": 65536,
"MaxChunksPerEntry": 1000,
"DefaultTtl": "24:00:00",
"EnableWriteBehind": true,
"WriteBehindFlushInterval": "00:00:05"
}
}
```
See [README.md](README.md) for full configuration reference.

View File

@@ -0,0 +1,419 @@
# Provcache Metrics and Alerting Guide
This document describes the Prometheus metrics exposed by the Provcache layer and recommended alerting configurations.
## Overview
Provcache emits metrics for monitoring cache performance, hit rates, latency, and invalidation patterns. These metrics enable operators to:
- Track cache effectiveness
- Identify performance degradation
- Detect anomalous invalidation patterns
- Capacity plan for cache infrastructure
## Prometheus Metrics
### Request Counters
#### `provcache_requests_total`
Total number of cache requests.
| Label | Values | Description |
|-------|--------|-------------|
| `source` | `valkey`, `postgres` | Cache tier that handled the request |
| `result` | `hit`, `miss`, `expired` | Request outcome |
```promql
# Total requests per minute
rate(provcache_requests_total[1m])
# Hit rate percentage
sum(rate(provcache_requests_total{result="hit"}[5m])) /
sum(rate(provcache_requests_total[5m])) * 100
```
#### `provcache_hits_total`
Total cache hits (subset of requests with `result="hit"`).
| Label | Values | Description |
|-------|--------|-------------|
| `source` | `valkey`, `postgres` | Cache tier |
```promql
# Valkey vs Postgres hit ratio
sum(rate(provcache_hits_total{source="valkey"}[5m])) /
sum(rate(provcache_hits_total[5m])) * 100
```
#### `provcache_misses_total`
Total cache misses.
| Label | Values | Description |
|-------|--------|-------------|
| `reason` | `not_found`, `expired`, `invalidated` | Miss reason |
```promql
# Miss rate by reason
sum by (reason) (rate(provcache_misses_total[5m]))
```
### Latency Histogram
#### `provcache_latency_seconds`
Latency distribution for cache operations.
| Label | Values | Description |
|-------|--------|-------------|
| `operation` | `get`, `set`, `invalidate` | Operation type |
| `source` | `valkey`, `postgres` | Cache tier |
Buckets: `0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0`
```promql
# P50 latency for cache gets
histogram_quantile(0.50, rate(provcache_latency_seconds_bucket{operation="get"}[5m]))
# P95 latency
histogram_quantile(0.95, rate(provcache_latency_seconds_bucket{operation="get"}[5m]))
# P99 latency
histogram_quantile(0.99, rate(provcache_latency_seconds_bucket{operation="get"}[5m]))
```
### Gauge Metrics
#### `provcache_items_count`
Current number of items in cache.
| Label | Values | Description |
|-------|--------|-------------|
| `source` | `valkey`, `postgres` | Cache tier |
```promql
# Total cached items
sum(provcache_items_count)
# Items by tier
sum by (source) (provcache_items_count)
```
### Invalidation Metrics
#### `provcache_invalidations_total`
Total invalidation events.
| Label | Values | Description |
|-------|--------|-------------|
| `reason` | `signer_revoked`, `epoch_advanced`, `ttl_expired`, `manual` | Invalidation trigger |
```promql
# Invalidation rate by reason
sum by (reason) (rate(provcache_invalidations_total[5m]))
# Security-related invalidations
sum(rate(provcache_invalidations_total{reason="signer_revoked"}[5m]))
```
### Trust Score Metrics
#### `provcache_trust_score_average`
Gauge showing average trust score across cached decisions.
```promql
# Current average trust score
provcache_trust_score_average
```
#### `provcache_trust_score_bucket`
Histogram of trust score distribution.
Buckets: `20, 40, 60, 80, 100`
```promql
# Percentage of decisions with trust score >= 80
sum(rate(provcache_trust_score_bucket{le="100"}[5m])) -
sum(rate(provcache_trust_score_bucket{le="80"}[5m]))
```
---
## Grafana Dashboard
A pre-built dashboard is available at `deploy/grafana/dashboards/provcache-overview.json`.
### Panels
| Panel | Type | Description |
|-------|------|-------------|
| Cache Hit Rate | Gauge | Current hit rate percentage |
| Hit Rate Over Time | Time series | Hit rate trend |
| Latency Percentiles | Time series | P50, P95, P99 latency |
| Invalidation Rate | Time series | Invalidations per minute |
| Cache Size | Time series | Item count over time |
| Hits by Source | Pie chart | Valkey vs Postgres distribution |
| Entry Size Distribution | Histogram | Size of cached entries |
| Trust Score Distribution | Histogram | Decision trust scores |
### Importing the Dashboard
```bash
# Via Grafana HTTP API
curl -X POST http://grafana:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GRAFANA_API_KEY" \
-d @deploy/grafana/dashboards/provcache-overview.json
# Via Helm (auto-provisioned)
# Dashboard is auto-imported when using StellaOps Helm chart
helm upgrade stellaops ./deploy/helm/stellaops \
--set grafana.dashboards.provcache.enabled=true
```
---
## Alerting Rules
### Recommended Alerts
#### Low Cache Hit Rate
```yaml
alert: ProvcacheLowHitRate
expr: |
sum(rate(provcache_requests_total{result="hit"}[5m])) /
sum(rate(provcache_requests_total[5m])) < 0.7
for: 10m
labels:
severity: warning
annotations:
summary: "Provcache hit rate below 70%"
description: "Cache hit rate is {{ $value | humanizePercentage }}. Check for invalidation storms or cold cache."
```
#### Critical Hit Rate Drop
```yaml
alert: ProvcacheCriticalHitRate
expr: |
sum(rate(provcache_requests_total{result="hit"}[5m])) /
sum(rate(provcache_requests_total[5m])) < 0.5
for: 5m
labels:
severity: critical
annotations:
summary: "Provcache hit rate critically low"
description: "Cache hit rate is {{ $value | humanizePercentage }}. Immediate investigation required."
```
#### High Latency
```yaml
alert: ProvcacheHighLatency
expr: |
histogram_quantile(0.95, rate(provcache_latency_seconds_bucket{operation="get"}[5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Provcache P95 latency above 100ms"
description: "P95 get latency is {{ $value | humanizeDuration }}. Check Valkey/Postgres performance."
```
#### Excessive Invalidations
```yaml
alert: ProvcacheInvalidationStorm
expr: |
sum(rate(provcache_invalidations_total[5m])) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "Provcache invalidation rate spike"
description: "Invalidations at {{ $value }} per second. Check for feed epoch changes or revocations."
```
#### Signer Revocation Spike
```yaml
alert: ProvcacheSignerRevocations
expr: |
sum(rate(provcache_invalidations_total{reason="signer_revoked"}[5m])) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "Signer revocation causing mass invalidation"
description: "{{ $value }} invalidations/sec due to signer revocation. Security event investigation required."
```
#### Cache Size Approaching Limit
```yaml
alert: ProvcacheSizeHigh
expr: |
sum(provcache_items_count) > 900000
for: 15m
labels:
severity: warning
annotations:
summary: "Provcache size approaching limit"
description: "Cache has {{ $value }} items. Consider scaling or tuning TTL."
```
#### Low Trust Scores
```yaml
alert: ProvcacheLowTrustScores
expr: |
provcache_trust_score_average < 60
for: 30m
labels:
severity: info
annotations:
summary: "Average trust score below 60"
description: "Average trust score is {{ $value }}. Review SBOM completeness and VEX coverage."
```
### AlertManager Configuration
```yaml
# alertmanager.yml
route:
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
- match:
alertname: ProvcacheSignerRevocations
receiver: 'security-team'
receivers:
- name: 'default-receiver'
slack_configs:
- channel: '#stellaops-alerts'
send_resolved: true
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: '<pagerduty-key>'
- name: 'security-team'
email_configs:
- to: 'security@example.com'
send_resolved: true
```
---
## Recording Rules
Pre-compute expensive queries for dashboard performance:
```yaml
# prometheus-rules.yml
groups:
- name: provcache-recording
interval: 30s
rules:
# Hit rate pre-computed
- record: provcache:hit_rate:5m
expr: |
sum(rate(provcache_requests_total{result="hit"}[5m])) /
sum(rate(provcache_requests_total[5m]))
# P95 latency pre-computed
- record: provcache:latency_p95:5m
expr: |
histogram_quantile(0.95, rate(provcache_latency_seconds_bucket{operation="get"}[5m]))
# Invalidation rate
- record: provcache:invalidation_rate:5m
expr: |
sum(rate(provcache_invalidations_total[5m]))
# Cache efficiency (hits per second vs misses)
- record: provcache:efficiency:5m
expr: |
sum(rate(provcache_hits_total[5m])) /
(sum(rate(provcache_hits_total[5m])) + sum(rate(provcache_misses_total[5m])))
```
---
## Operational Runbook
### Low Hit Rate Investigation
1. **Check invalidation metrics** — Is there an invalidation storm?
```promql
sum by (reason) (rate(provcache_invalidations_total[5m]))
```
2. **Check cache age** — Is the cache newly deployed (cold)?
```promql
sum(provcache_items_count)
```
3. **Check request patterns** — Are there many unique VeriKeys?
```promql
# High cardinality of unique requests suggests insufficient cache sharing
```
4. **Check TTL configuration** — Is TTL too aggressive?
- Review `Provcache:DefaultTtl` setting
- Consider increasing for stable workloads
### High Latency Investigation
1. **Check Valkey health**
```bash
redis-cli -h valkey info stats
```
2. **Check Postgres connections**
```sql
SELECT count(*) FROM pg_stat_activity WHERE datname = 'stellaops';
```
3. **Check entry sizes**
```promql
histogram_quantile(0.95, rate(provcache_entry_size_bytes_bucket[5m]))
```
4. **Check network latency** between services
### Invalidation Storm Response
1. **Identify cause**
```promql
sum by (reason) (increase(provcache_invalidations_total[10m]))
```
2. **If epoch-related**: Expected during feed updates. Monitor duration.
3. **If signer-related**: Security event — escalate to security team.
4. **If manual**: Check audit logs for unauthorized invalidation.
---
## Related Documentation
- [Provcache Module README](../provcache/README.md) — Core concepts
- [Provcache Architecture](../provcache/architecture.md) — Technical details
- [Telemetry Architecture](../telemetry/architecture.md) — Observability patterns
- [Grafana Dashboard Guide](../../deploy/grafana/README.md) — Dashboard management

View File

@@ -0,0 +1,439 @@
# Provcache OCI Attestation Verification Guide
This document describes how to verify Provcache decision attestations attached to OCI container images.
## Overview
StellaOps can attach provenance cache decisions as OCI-attached attestations to container images. These attestations enable:
- **Supply chain verification** — Verify security decisions were made by trusted evaluators
- **Audit trails** — Retrieve the exact decision state at image push time
- **Policy gates** — Admission controllers can verify attestations before deployment
- **Offline verification** — Decisions verifiable without calling StellaOps services
## Attestation Format
### Predicate Type
```
stella.ops/provcache@v1
```
### Predicate Schema
```json
{
"_type": "stella.ops/provcache@v1",
"veriKey": "sha256:abc123...",
"decision": {
"digestVersion": "v1",
"verdictHash": "sha256:def456...",
"proofRoot": "sha256:789abc...",
"trustScore": 85,
"createdAt": "2025-12-24T12:00:00Z",
"expiresAt": "2025-12-25T12:00:00Z"
},
"inputs": {
"sourceDigest": "sha256:image...",
"sbomDigest": "sha256:sbom...",
"policyDigest": "sha256:policy...",
"feedEpoch": "2024-W52"
},
"verdicts": {
"CVE-2024-1234": "mitigated",
"CVE-2024-5678": "affected"
}
}
```
### Field Descriptions
| Field | Type | Description |
|-------|------|-------------|
| `_type` | string | Predicate type URI |
| `veriKey` | string | VeriKey hash identifying this decision context |
| `decision.digestVersion` | string | Decision digest schema version |
| `decision.verdictHash` | string | Hash of all verdicts |
| `decision.proofRoot` | string | Merkle proof root hash |
| `decision.trustScore` | number | Overall trust score (0-100) |
| `decision.createdAt` | string | ISO-8601 creation timestamp |
| `decision.expiresAt` | string | ISO-8601 expiry timestamp |
| `inputs.sourceDigest` | string | Container image digest |
| `inputs.sbomDigest` | string | SBOM document digest |
| `inputs.policyDigest` | string | Policy bundle digest |
| `inputs.feedEpoch` | string | Feed epoch identifier |
| `verdicts` | object | Map of CVE IDs to verdict status |
---
## Verification with Cosign
### Prerequisites
```bash
# Install cosign
brew install cosign # macOS
# or
go install github.com/sigstore/cosign/v2/cmd/cosign@latest
```
### Basic Verification
```bash
# Verify attestation exists and is signed
cosign verify-attestation \
--type stella.ops/provcache@v1 \
registry.example.com/app:v1.2.3
```
### Verify with Identity Constraints
```bash
# Verify with signer identity (Fulcio)
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate-identity-regexp '.*@stellaops\.example\.com' \
--certificate-oidc-issuer https://auth.stellaops.example.com \
registry.example.com/app:v1.2.3
```
### Verify with Custom Trust Root
```bash
# Using enterprise CA
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate /path/to/enterprise-ca.crt \
--certificate-chain /path/to/ca-chain.crt \
registry.example.com/app:v1.2.3
```
### Extract Attestation Payload
```bash
# Get raw attestation JSON
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate-identity-regexp '.*@stellaops\.example\.com' \
--certificate-oidc-issuer https://auth.stellaops.example.com \
registry.example.com/app:v1.2.3 | jq '.payload' | base64 -d | jq .
```
---
## Verification with StellaOps CLI
### Verify Attestation
```bash
# Verify using StellaOps CLI
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache
# Output:
# ✓ Attestation found: stella.ops/provcache@v1
# ✓ Signature valid (Fulcio)
# ✓ Trust score: 85
# ✓ Decision created: 2025-12-24T12:00:00Z
# ✓ Decision expires: 2025-12-25T12:00:00Z
```
### Verify with Policy Requirements
```bash
# Verify with minimum trust score
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache \
--min-trust-score 80
# Verify with freshness requirement
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache \
--max-age 24h
```
### Extract Decision Details
```bash
# Get full decision details
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache \
--output json | jq .
# Get specific fields
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache \
--output json | jq '.predicate.verdicts'
```
---
## Kubernetes Admission Control
### Gatekeeper Policy
```yaml
# constraint-template.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: provcacheattestation
spec:
crd:
spec:
names:
kind: ProvcacheAttestation
validation:
openAPIV3Schema:
type: object
properties:
minTrustScore:
type: integer
minimum: 0
maximum: 100
maxAgeHours:
type: integer
minimum: 1
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package provcacheattestation
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
image := container.image
not has_valid_attestation(image)
msg := sprintf("Image %v missing valid provcache attestation", [image])
}
has_valid_attestation(image) {
attestation := get_attestation(image, "stella.ops/provcache@v1")
attestation.predicate.decision.trustScore >= input.parameters.minTrustScore
not is_expired(attestation.predicate.decision.expiresAt)
}
is_expired(expiry) {
time.parse_rfc3339_ns(expiry) < time.now_ns()
}
```
```yaml
# constraint.yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: ProvcacheAttestation
metadata:
name: require-provcache-attestation
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
namespaces:
- production
parameters:
minTrustScore: 80
maxAgeHours: 48
```
### Kyverno Policy
```yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-provcache-attestation
spec:
validationFailureAction: enforce
background: true
rules:
- name: check-provcache-attestation
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "*"
attestations:
- predicateType: stella.ops/provcache@v1
conditions:
- all:
- key: "{{ decision.trustScore }}"
operator: GreaterThanOrEquals
value: 80
- key: "{{ decision.expiresAt }}"
operator: GreaterThan
value: "{{ time.Now() }}"
attestors:
- entries:
- keyless:
issuer: https://auth.stellaops.example.com
subject: ".*@stellaops\\.example\\.com"
```
---
## CI/CD Integration
### GitHub Actions
```yaml
# .github/workflows/verify-attestation.yml
name: Verify Provcache Attestation
on:
workflow_dispatch:
inputs:
image:
description: 'Image to verify'
required: true
jobs:
verify:
runs-on: ubuntu-latest
steps:
- name: Install cosign
uses: sigstore/cosign-installer@v3
- name: Verify attestation
run: |
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate-identity-regexp '.*@stellaops\.example\.com' \
--certificate-oidc-issuer https://auth.stellaops.example.com \
${{ inputs.image }}
- name: Check trust score
run: |
TRUST_SCORE=$(cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate-identity-regexp '.*@stellaops\.example\.com' \
--certificate-oidc-issuer https://auth.stellaops.example.com \
${{ inputs.image }} | jq -r '.payload' | base64 -d | jq '.predicate.decision.trustScore')
if [ "$TRUST_SCORE" -lt 80 ]; then
echo "Trust score $TRUST_SCORE is below threshold (80)"
exit 1
fi
```
### GitLab CI
```yaml
# .gitlab-ci.yml
verify-attestation:
stage: verify
image: gcr.io/projectsigstore/cosign:latest
script:
- cosign verify-attestation
--type stella.ops/provcache@v1
--certificate-identity-regexp '.*@stellaops\.example\.com'
--certificate-oidc-issuer https://auth.stellaops.example.com
${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}
rules:
- if: $CI_COMMIT_TAG
```
---
## Troubleshooting
### No Attestation Found
```bash
# List all attestations on image
cosign tree registry.example.com/app:v1.2.3
# Check if attestation was pushed
crane manifest registry.example.com/app:sha256-<digest>.att
```
### Signature Verification Failed
```bash
# Check certificate details
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--output text \
registry.example.com/app:v1.2.3 2>&1 | grep -A5 "Certificate"
# Verify with verbose output
COSIGN_EXPERIMENTAL=1 cosign verify-attestation \
--type stella.ops/provcache@v1 \
registry.example.com/app:v1.2.3 -v
```
### Attestation Expired
```bash
# Check expiry timestamp
cosign verify-attestation \
--type stella.ops/provcache@v1 \
--certificate-identity-regexp '.*@stellaops\.example\.com' \
--certificate-oidc-issuer https://auth.stellaops.example.com \
registry.example.com/app:v1.2.3 | \
jq -r '.payload' | base64 -d | jq '.predicate.decision.expiresAt'
```
### Trust Score Below Threshold
```bash
# Check trust score breakdown
stella verify attestation \
--image registry.example.com/app:v1.2.3 \
--type provcache \
--output json | jq '.predicate.decision.trustScore'
# If score is low, check individual components:
# - SBOM completeness
# - VEX coverage
# - Reachability analysis
# - Policy freshness
# - Signer trust
```
---
## Security Considerations
### Key Management
- **Fulcio** — Ephemeral certificates tied to OIDC identity; recommended for public workflows
- **Enterprise CA** — Long-lived certificates for air-gapped environments
- **Self-signed** — Only for development/testing; not recommended for production
### Attestation Integrity
- Attestations are signed at push time
- Signature covers the entire predicate payload
- Modifying any field invalidates the signature
### Expiry Handling
- Attestations have `expiresAt` timestamps
- Expired attestations should be rejected by admission controllers
- Consider re-scanning images before deployment to get fresh attestations
### Verdict Reconciliation
- Verdicts in attestation reflect state at push time
- New vulnerabilities discovered after push won't appear
- Use `stella verify attestation --check-freshness` to compare against current feeds
---
## Related Documentation
- [Provcache Module README](./README.md) — Core concepts
- [Provcache Metrics and Alerting](./metrics-alerting.md) — Observability
- [Signer Module](../signer/architecture.md) — Signing infrastructure
- [Attestor Module](../attestor/architecture.md) — Attestation generation
- [OCI Artifact Spec](https://github.com/opencontainers/image-spec) — OCI standards
- [In-toto Attestation Spec](https://github.com/in-toto/attestation) — Attestation format
- [Sigstore Documentation](https://docs.sigstore.dev/) — Cosign and Fulcio