161 lines
10 KiB
Markdown
161 lines
10 KiB
Markdown
# Feedser Conflict Resolution Runbook (Sprint 3)
|
||
|
||
This runbook equips Feedser operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint.
|
||
|
||
---
|
||
|
||
## 1. Precedence Model (recap)
|
||
|
||
- **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `feedser:merge:precedence:ranks` to override per source when incident response requires it.
|
||
- **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`.
|
||
- **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`.
|
||
- **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes.
|
||
|
||
---
|
||
|
||
## 2. Telemetry Shipped This Sprint
|
||
|
||
| Instrument | Type | Key Tags | Purpose |
|
||
|------------|------|----------|---------|
|
||
| `feedser.merge.operations` | Counter | `inputs` | Total precedence merges executed. |
|
||
| `feedser.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. |
|
||
| `feedser.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. |
|
||
| `feedser.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. |
|
||
| `feedser.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. |
|
||
|
||
### Structured logs
|
||
|
||
- `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts.
|
||
- `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions.
|
||
- `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios.
|
||
- `Alias collision ...` (no EventId) - emitted when `feedser.merge.identity_conflicts` increments.
|
||
|
||
Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Feedser.Merge`.
|
||
|
||
---
|
||
|
||
## 3. Detection & Alerting
|
||
|
||
1. **Dashboard panels**
|
||
- `feedser.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window.
|
||
- `feedser.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data.
|
||
- `feedser.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
|
||
- `feedser.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day.
|
||
2. **Log based alerts**
|
||
- `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners.
|
||
- `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained.
|
||
3. **Job health**
|
||
- `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #feedser-ops.
|
||
|
||
### Threshold updates (2025-10-12)
|
||
|
||
- `feedser.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
|
||
- `feedser.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`).
|
||
- `feedser.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes.
|
||
|
||
---
|
||
|
||
## 4. Triage Workflow
|
||
|
||
1. **Confirm job context**
|
||
- `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage.
|
||
2. **Inspect metrics**
|
||
- Correlate spikes in `feedser.merge.conflicts` with `primary_source`/`suppressed_source` tags from `feedser.merge.overrides`.
|
||
3. **Pull structured logs**
|
||
- Example (vector output):
|
||
```
|
||
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-feedser.log
|
||
```
|
||
4. **Review merge events**
|
||
- `mongosh`:
|
||
```javascript
|
||
use feedser;
|
||
db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
|
||
```
|
||
- Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output.
|
||
5. **Interrogate provenance**
|
||
- `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })`
|
||
- Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen.
|
||
|
||
---
|
||
|
||
## 5. Conflict Classification Matrix
|
||
|
||
| Signal | Likely Cause | Immediate Action |
|
||
|--------|--------------|------------------|
|
||
| `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
|
||
| `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
|
||
| `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `feedser:merge:precedence:ranks` to break the tie; restart merge job. |
|
||
| Rising `feedser.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. |
|
||
| `feedser.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. |
|
||
|
||
---
|
||
|
||
## 6. Resolution Playbook
|
||
|
||
1. **Connector data fix**
|
||
- Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.).
|
||
- Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected.
|
||
2. **Temporary precedence override**
|
||
- Edit `etc/feedser.yaml`:
|
||
```yaml
|
||
feedser:
|
||
merge:
|
||
precedence:
|
||
ranks:
|
||
osv: 1
|
||
ghsa: 0
|
||
```
|
||
- Restart Feedser workers; confirm tags in `feedser.merge.overrides` show the new ranks.
|
||
- Document the override with expiry in the change log.
|
||
3. **Alias remediation**
|
||
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
|
||
- Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing).
|
||
4. **Escalation**
|
||
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs.
|
||
|
||
---
|
||
|
||
## 7. Validation Checklist
|
||
|
||
- [ ] Merge job rerun returns exit code `0`.
|
||
- [ ] `feedser.merge.conflicts` baseline returns to zero after corrective action.
|
||
- [ ] Latest `merge_event` entry shows expected hash delta.
|
||
- [ ] Affected advisory document shows updated `provenance[].decisionReason`.
|
||
- [ ] Ops change log updated with incident summary, config overrides, and rollback plan.
|
||
|
||
---
|
||
|
||
## 8. Reference Material
|
||
|
||
- Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`.
|
||
- Merge engine internals: `src/StellaOps.Feedser.Merge/Services/AdvisoryPrecedenceMerger.cs`.
|
||
- Metrics definitions: `src/StellaOps.Feedser.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`.
|
||
- Storage audit trail: `src/StellaOps.Feedser.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Feedser.Storage.Mongo/MergeEvents`.
|
||
|
||
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
|
||
|
||
---
|
||
|
||
## 9. Synthetic Regression Fixtures
|
||
|
||
- **Locations** – Canonical conflict snapshots now live at `src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Feedser.Source.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/conflict-osv.canonical.json`.
|
||
- **Validation commands** – To regenerate and verify the fixtures offline, run:
|
||
|
||
```bash
|
||
dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
|
||
dotnet test src/StellaOps.Feedser.Source.Nvd.Tests/StellaOps.Feedser.Source.Nvd.Tests.csproj --filter NvdConflictFixtureTests
|
||
dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj --filter OsvConflictFixtureTests
|
||
dotnet test src/StellaOps.Feedser.Merge.Tests/StellaOps.Feedser.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
|
||
```
|
||
|
||
- **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `feedser.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
|
||
|
||
---
|
||
|
||
## 10. Change Log
|
||
|
||
| Date (UTC) | Change | Notes |
|
||
|------------|--------|-------|
|
||
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Feedser Ops Guild; no additional overrides required. |
|