Gaps fill up, fixes, ui restructuring
This commit is contained in:
143
docs/runbooks/federated-telemetry-operations.md
Normal file
143
docs/runbooks/federated-telemetry-operations.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Federated Telemetry Operations Runbook
|
||||
|
||||
## Overview
|
||||
|
||||
This runbook covers operational procedures for the Stella Ops Federated Telemetry subsystem, including enabling/disabling federation, managing consent, monitoring privacy budgets, and troubleshooting sync failures.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Platform admin access with `platform:federation:manage` scope.
|
||||
- Access to the Stella Ops Platform Ops UI or API.
|
||||
|
||||
## Procedures
|
||||
|
||||
### 1. Enable Federation
|
||||
|
||||
Federation is enabled by default when `SealedModeEnabled` is `false` in configuration.
|
||||
|
||||
**Via configuration:**
|
||||
```json
|
||||
{
|
||||
"FederatedTelemetry": {
|
||||
"SealedModeEnabled": false,
|
||||
"SiteId": "your-site-id",
|
||||
"KAnonymityThreshold": 5,
|
||||
"EpsilonBudget": 1.0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```
|
||||
GET /api/v1/telemetry/federation/status
|
||||
```
|
||||
Response should show `enabled: true`, `sealedMode: false`.
|
||||
|
||||
### 2. Disable Federation (Sealed Mode)
|
||||
|
||||
Set `SealedModeEnabled: true` in configuration and restart the service.
|
||||
|
||||
In sealed mode:
|
||||
- No outbound federation traffic.
|
||||
- Sync service skips all cycles.
|
||||
- Local aggregation continues for internal use.
|
||||
|
||||
### 3. Grant Consent
|
||||
|
||||
Consent must be explicitly granted before any data is shared.
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/consent/grant
|
||||
{
|
||||
"grantedBy": "admin@example.com",
|
||||
"ttlHours": 720
|
||||
}
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Consent Management and click "Grant Consent".
|
||||
|
||||
### 4. Revoke Consent
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/consent/revoke
|
||||
{
|
||||
"revokedBy": "admin@example.com"
|
||||
}
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Consent Management and click "Revoke Consent".
|
||||
|
||||
After revocation:
|
||||
- No new bundles will be created.
|
||||
- Existing bundles remain in the store.
|
||||
- Federation peers will stop receiving updates.
|
||||
|
||||
### 5. Monitor Privacy Budget
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
GET /api/v1/telemetry/federation/privacy-budget
|
||||
```
|
||||
|
||||
**Via UI:**
|
||||
Navigate to Platform Ops > Federation > Privacy Budget.
|
||||
|
||||
Key metrics:
|
||||
- `remaining` / `total`: Current epsilon consumption.
|
||||
- `exhausted`: If true, no aggregation until next reset.
|
||||
- `queriesThisPeriod`: Number of successful aggregations.
|
||||
- `suppressedThisPeriod`: Number of rejected aggregations due to budget.
|
||||
- `nextReset`: When the budget will be replenished.
|
||||
|
||||
### 6. Manual Aggregation Trigger
|
||||
|
||||
**Via API:**
|
||||
```
|
||||
POST /api/v1/telemetry/federation/trigger
|
||||
```
|
||||
|
||||
Will fail if:
|
||||
- Privacy budget is exhausted.
|
||||
- Consent is not granted.
|
||||
- Sealed mode is active.
|
||||
|
||||
### 7. Troubleshooting Sync Failures
|
||||
|
||||
**Symptom: No bundles being created**
|
||||
|
||||
Check in order:
|
||||
1. Is federation enabled? (`GET /status` -> `enabled: true`)
|
||||
2. Is consent granted? (`GET /consent` -> `granted: true`)
|
||||
3. Is privacy budget available? (`GET /privacy-budget` -> `exhausted: false`)
|
||||
4. Are there telemetry facts to aggregate? (Check service logs for "No telemetry facts to aggregate")
|
||||
5. Is egress policy blocking? (Check service logs for "Egress blocked")
|
||||
|
||||
**Symptom: Budget exhausting too quickly**
|
||||
|
||||
Options:
|
||||
- Increase `EpsilonBudget` (less privacy, more queries).
|
||||
- Decrease aggregation frequency (`AggregationInterval`).
|
||||
- Increase `KAnonymityThreshold` (more suppression, fewer budget-consuming buckets).
|
||||
|
||||
**Symptom: All buckets suppressed**
|
||||
|
||||
The k-anonymity threshold is too high relative to the data volume. Either:
|
||||
- Lower `KAnonymityThreshold`.
|
||||
- Wait for more diverse telemetry data.
|
||||
|
||||
### 8. Configuration Reference
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `KAnonymityThreshold` | 5 | Minimum distinct artifacts per CVE bucket |
|
||||
| `EpsilonBudget` | 1.0 | Total differential privacy budget per period |
|
||||
| `BudgetResetPeriod` | 24h | Budget reset interval |
|
||||
| `AggregationInterval` | 15m | Background sync cycle interval |
|
||||
| `SealedModeEnabled` | false | Disable all outbound federation traffic |
|
||||
| `SiteId` | "default" | This instance's federation mesh identifier |
|
||||
| `ConsentPredicateType` | stella.ops/federatedConsent@v1 | DSSE predicate for consent proofs |
|
||||
| `BundlePredicateType` | stella.ops/federatedTelemetry@v1 | DSSE predicate for telemetry bundles |
|
||||
Reference in New Issue
Block a user