Files
git.stella-ops.org/docs/runbooks/federated-telemetry-operations.md
2026-02-19 22:10:54 +02:00

144 lines
4.0 KiB
Markdown

# Federated Telemetry Operations Runbook
## Overview
This runbook covers operational procedures for the Stella Ops Federated Telemetry subsystem, including enabling/disabling federation, managing consent, monitoring privacy budgets, and troubleshooting sync failures.
## Prerequisites
- Platform admin access with `platform:federation:manage` scope.
- Access to the Stella Ops Platform Ops UI or API.
## Procedures
### 1. Enable Federation
Federation is enabled by default when `SealedModeEnabled` is `false` in configuration.
**Via configuration:**
```json
{
"FederatedTelemetry": {
"SealedModeEnabled": false,
"SiteId": "your-site-id",
"KAnonymityThreshold": 5,
"EpsilonBudget": 1.0
}
}
```
**Verification:**
```
GET /api/v1/telemetry/federation/status
```
Response should show `enabled: true`, `sealedMode: false`.
### 2. Disable Federation (Sealed Mode)
Set `SealedModeEnabled: true` in configuration and restart the service.
In sealed mode:
- No outbound federation traffic.
- Sync service skips all cycles.
- Local aggregation continues for internal use.
### 3. Grant Consent
Consent must be explicitly granted before any data is shared.
**Via API:**
```
POST /api/v1/telemetry/federation/consent/grant
{
"grantedBy": "admin@example.com",
"ttlHours": 720
}
```
**Via UI:**
Navigate to Platform Ops > Federation > Consent Management and click "Grant Consent".
### 4. Revoke Consent
**Via API:**
```
POST /api/v1/telemetry/federation/consent/revoke
{
"revokedBy": "admin@example.com"
}
```
**Via UI:**
Navigate to Platform Ops > Federation > Consent Management and click "Revoke Consent".
After revocation:
- No new bundles will be created.
- Existing bundles remain in the store.
- Federation peers will stop receiving updates.
### 5. Monitor Privacy Budget
**Via API:**
```
GET /api/v1/telemetry/federation/privacy-budget
```
**Via UI:**
Navigate to Platform Ops > Federation > Privacy Budget.
Key metrics:
- `remaining` / `total`: Current epsilon consumption.
- `exhausted`: If true, no aggregation until next reset.
- `queriesThisPeriod`: Number of successful aggregations.
- `suppressedThisPeriod`: Number of rejected aggregations due to budget.
- `nextReset`: When the budget will be replenished.
### 6. Manual Aggregation Trigger
**Via API:**
```
POST /api/v1/telemetry/federation/trigger
```
Will fail if:
- Privacy budget is exhausted.
- Consent is not granted.
- Sealed mode is active.
### 7. Troubleshooting Sync Failures
**Symptom: No bundles being created**
Check in order:
1. Is federation enabled? (`GET /status` -> `enabled: true`)
2. Is consent granted? (`GET /consent` -> `granted: true`)
3. Is privacy budget available? (`GET /privacy-budget` -> `exhausted: false`)
4. Are there telemetry facts to aggregate? (Check service logs for "No telemetry facts to aggregate")
5. Is egress policy blocking? (Check service logs for "Egress blocked")
**Symptom: Budget exhausting too quickly**
Options:
- Increase `EpsilonBudget` (less privacy, more queries).
- Decrease aggregation frequency (`AggregationInterval`).
- Increase `KAnonymityThreshold` (more suppression, fewer budget-consuming buckets).
**Symptom: All buckets suppressed**
The k-anonymity threshold is too high relative to the data volume. Either:
- Lower `KAnonymityThreshold`.
- Wait for more diverse telemetry data.
### 8. Configuration Reference
| Setting | Default | Description |
|---------|---------|-------------|
| `KAnonymityThreshold` | 5 | Minimum distinct artifacts per CVE bucket |
| `EpsilonBudget` | 1.0 | Total differential privacy budget per period |
| `BudgetResetPeriod` | 24h | Budget reset interval |
| `AggregationInterval` | 15m | Background sync cycle interval |
| `SealedModeEnabled` | false | Disable all outbound federation traffic |
| `SiteId` | "default" | This instance's federation mesh identifier |
| `ConsentPredicateType` | stella.ops/federatedConsent@v1 | DSSE predicate for consent proofs |
| `BundlePredicateType` | stella.ops/federatedTelemetry@v1 | DSSE predicate for telemetry bundles |