Gaps fill up, fixes, ui restructuring
This commit is contained in:
143
docs/runbooks/federated-telemetry-operations.md
Normal file
143
docs/runbooks/federated-telemetry-operations.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Federated Telemetry Operations Runbook
|
||||
|
||||
## Overview
|
||||
|
||||
This runbook covers operational procedures for the Stella Ops Federated Telemetry subsystem, including enabling/disabling federation, managing consent, monitoring privacy budgets, and troubleshooting sync failures.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Platform admin access with `platform:federation:manage` scope.
|
||||
- Access to the Stella Ops Platform Ops UI or API.
|
||||
|
||||
## Procedures
|
||||
|
||||
### 1. Enable Federation
|
||||
|
||||
Federation is enabled by default when `SealedModeEnabled` is `false` in configuration.
|
||||
|
||||
**Via configuration:**
|
||||
```json
|
||||
{
|
||||
"FederatedTelemetry": {
|
||||
"SealedModeEnabled": false,
|
||||
"SiteId": "your-site-id",
|
||||
"KAnonymityThreshold": 5,
|
||||
"EpsilonBudget": 1.0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```
|
||||
GET /api/v1/telemetry/federation/status
|
||||
```
|
||||
Response should show `enabled: true`, `sealedMode: false`.
|
||||
|
||||
### 2. Disable Federation (Sealed Mode)
|
||||
|
||||
Set `SealedModeEnabled: true` in configuration and restart the service.
|
||||
|
||||
In sealed mode:
|
||||
- No outbound federation traffic.
|
||||
- Sync service skips all cycles.
|
||||
- Local aggregation continues for internal use.
|
||||
|
||||
### 3. Grant Consent
|
||||
|
||||
Consent must be explicitly granted before any data is shared.
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/consent/grant
|
||||
{
|
||||
"grantedBy": "admin@example.com",
|
||||
"ttlHours": 720
|
||||
}
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Consent Management and click "Grant Consent".
|
||||
|
||||
### 4. Revoke Consent
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/consent/revoke
|
||||
{
|
||||
"revokedBy": "admin@example.com"
|
||||
}
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Consent Management and click "Revoke Consent".
|
||||
|
||||
After revocation:
|
||||
- No new bundles will be created.
|
||||
- Existing bundles remain in the store.
|
||||
- Federation peers will stop receiving updates.
|
||||
|
||||
### 5. Monitor Privacy Budget
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
GET /api/v1/telemetry/federation/privacy-budget
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Privacy Budget.
|
||||
|
||||
Key metrics:
|
||||
- `remaining` / `total`: Current epsilon consumption.
|
||||
- `exhausted`: If true, no aggregation until next reset.
|
||||
- `queriesThisPeriod`: Number of successful aggregations.
|
||||
- `suppressedThisPeriod`: Number of rejected aggregations due to budget.
|
||||
- `nextReset`: When the budget will be replenished.
|
||||
|
||||
### 6. Manual Aggregation Trigger
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/trigger
|
||||
```
|
||||
|
||||
Will fail if:
|
||||
- Privacy budget is exhausted.
|
||||
- Consent is not granted.
|
||||
- Sealed mode is active.
|
||||
|
||||
### 7. Troubleshooting Sync Failures
|
||||
|
||||
**Symptom: No bundles being created**
|
||||
|
||||
Check in order:
|
||||
1. Is federation enabled? (`GET /status` -> `enabled: true`)
|
||||
2. Is consent granted? (`GET /consent` -> `granted: true`)
|
||||
3. Is privacy budget available? (`GET /privacy-budget` -> `exhausted: false`)
|
||||
4. Are there telemetry facts to aggregate? (Check service logs for "No telemetry facts to aggregate")
|
||||
5. Is egress policy blocking? (Check service logs for "Egress blocked")
|
||||
|
||||
**Symptom: Budget exhausting too quickly**
|
||||
|
||||
Options:
|
||||
- Increase `EpsilonBudget` (less privacy, more queries).
|
||||
- Decrease aggregation frequency (`AggregationInterval`).
|
||||
- Increase `KAnonymityThreshold` (more suppression, fewer budget-consuming buckets).
|
||||
|
||||
**Symptom: All buckets suppressed**
|
||||
|
||||
The k-anonymity threshold is too high relative to the data volume. Either:
|
||||
- Lower `KAnonymityThreshold`.
|
||||
- Wait for more diverse telemetry data.
|
||||
|
||||
### 8. Configuration Reference
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `KAnonymityThreshold` | 5 | Minimum distinct artifacts per CVE bucket |
|
||||
| `EpsilonBudget` | 1.0 | Total differential privacy budget per period |
|
||||
| `BudgetResetPeriod` | 24h | Budget reset interval |
|
||||
| `AggregationInterval` | 15m | Background sync cycle interval |
|
||||
| `SealedModeEnabled` | false | Disable all outbound federation traffic |
|
||||
| `SiteId` | "default" | This instance's federation mesh identifier |
|
||||
| `ConsentPredicateType` | stella.ops/federatedConsent@v1 | DSSE predicate for consent proofs |
|
||||
| `BundlePredicateType` | stella.ops/federatedTelemetry@v1 | DSSE predicate for telemetry bundles |
|
||||
@@ -23,7 +23,7 @@ stella doctor --tag registry --format json --output registry-report.json
|
||||
| Registry | Referrers API | Recommendation |
|
||||
|----------|---------------|----------------|
|
||||
| ACR, ECR, GCR, Harbor 2.6+, Quay 3.12+, JFrog 7.x+, Zot | Native | Full support |
|
||||
| GHCR, Docker Hub, registry:2 | Fallback | Supported with automatic fallback |
|
||||
| GHCR, GitLab, Docker Hub, registry:2 | Fallback | Supported with automatic fallback |
|
||||
|
||||
## Common Issues
|
||||
|
||||
|
||||
@@ -24,8 +24,9 @@ This runbook covers diagnosing and resolving OCI referrer discovery issues durin
|
||||
| ECR | Yes | Yes | Requires proper IAM permissions |
|
||||
| ACR | Yes | Yes | Full OCI 1.1 support |
|
||||
| Harbor 2.0+ | Yes | Yes | Full OCI 1.1 support |
|
||||
| Quay | Partial | Yes | Varies by version |
|
||||
| Quay | Partial | Yes | Varies by version; admin toggles may control feature |
|
||||
| JFrog Artifactory | Partial | Yes | Requires OCI layout repository |
|
||||
| GitLab | No | Yes | Stores `subject` field but no referrers endpoint |
|
||||
|
||||
See [Registry Compatibility Matrix](../modules/export-center/registry-compatibility.md) for detailed information.
|
||||
|
||||
@@ -169,6 +170,37 @@ curl "https://registry.example.com/v2/repo/referrers/sha256:abc123?artifactType=
|
||||
2. Verify bundle integrity: `sha256sum bundle.tgz`
|
||||
3. Check if referrer was intentionally updated upstream
|
||||
|
||||
### Issue: Harbor UI shows referrers as "UNKNOWN" artifact type
|
||||
|
||||
**Symptoms:**
|
||||
- Referrer artifacts (cosign signatures, SBOMs) appear as "UNKNOWN" in Harbor UI
|
||||
- API-level discovery works correctly
|
||||
|
||||
**Causes:**
|
||||
1. Harbor UI mediaType classification lags API capabilities (especially around v2.15+)
|
||||
2. Custom artifact types not recognized by Harbor's UI layer
|
||||
|
||||
**Solutions:**
|
||||
- This is a Harbor-side UI classification issue; it does **not** affect StellaOps referrer discovery or functionality
|
||||
- Verify API-level discovery works: `curl -H "Accept: application/vnd.oci.image.index.v1+json" "https://harbor.example.com/v2/repo/referrers/sha256:..."`
|
||||
- If needed, check Harbor release notes for mediaType classification updates
|
||||
|
||||
### Issue: Quay referrers API returns inconsistent results
|
||||
|
||||
**Symptoms:**
|
||||
- Referrer discovery works on Quay.io but not on self-hosted Quay
|
||||
- Intermittent 404 or empty results from referrers endpoint
|
||||
|
||||
**Causes:**
|
||||
1. OCI Referrers API feature not enabled in self-hosted Quay deployment
|
||||
2. Quay admin toggles or deployment flags controlling the feature
|
||||
|
||||
**Solutions:**
|
||||
- Verify the OCI Referrers API feature is enabled in Quay's deployment configuration
|
||||
- Check Quay admin console for referrers-related feature flags
|
||||
- If feature is disabled, StellaOps automatically uses tag-based fallback; no action required
|
||||
- Contact Quay administrator to enable the feature if native referrers discovery is preferred
|
||||
|
||||
### Issue: Slow referrer discovery
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
Reference in New Issue
Block a user