# Concelier Conflict Resolution Runbook (Sprint 3) This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (`AdvisoryPrecedenceMerger`, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md` and the metrics/logging instrumentation delivered this sprint. --- ## 1. Precedence Model (recap) - **Default ranking:** `GHSA -> NVD -> OSV`, with distro/vendor PSIRTs outranking ecosystem feeds (`AdvisoryPrecedenceDefaults`). Use `concelier:merge:precedence:ranks` to override per source when incident response requires it. - **Freshness override:** if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps `provenance[].decisionReason = freshness`. - **Tie-breakers:** when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set `decisionReason = tie-breaker`. - **Audit trail:** each merged advisory receives a `merge` provenance entry listing the participating sources plus a `merge_event` record with canonical before/after SHA-256 hashes. --- ## 2. Telemetry Shipped This Sprint | Instrument | Type | Key Tags | Purpose | |------------|------|----------|---------| | `concelier.merge.operations` | Counter | `inputs` | Total precedence merges executed. | | `concelier.merge.overrides` | Counter | `primary_source`, `suppressed_source`, `primary_rank`, `suppressed_rank` | Field-level overrides chosen by precedence. | | `concelier.merge.range_overrides` | Counter | `advisory_key`, `package_type`, `primary_source`, `suppressed_source`, `primary_range_count`, `suppressed_range_count` | Package range overrides emitted by `AffectedPackagePrecedenceResolver`. | | `concelier.merge.conflicts` | Counter | `type` (`severity`, `precedence_tie`), `reason` (`mismatch`, `primary_missing`, `equal_rank`) | Conflicts requiring operator review. | | `concelier.merge.identity_conflicts` | Counter | `scheme`, `alias_value`, `advisory_count` | Alias collisions surfaced by the identity graph. | ### Structured logs - `AdvisoryOverride` (EventId 1000) - logs merge suppressions with alias/provenance counts. - `PackageRangeOverride` (EventId 1001) - logs package-level precedence decisions. - `PrecedenceConflict` (EventId 1002) - logs mismatched severity or equal-rank scenarios. - `Alias collision ...` (no EventId) - emitted when `concelier.merge.identity_conflicts` increments. Expect all logs at `Information`. Ensure OTEL exporters include the scope `StellaOps.Concelier.Merge`. --- ## 3. Detection & Alerting 1. **Dashboard panels** - `concelier.merge.conflicts` - table grouped by `type/reason`. Alert when > 0 in a 15 minute window. - `concelier.merge.range_overrides` - stacked bar by `package_type`. Spikes highlight vendor PSIRT overrides over registry data. - `concelier.merge.overrides` with `primary_source|suppressed_source` - catches unexpected precedence flips (e.g., OSV overtaking GHSA). - `concelier.merge.identity_conflicts` - single-stat; alert when alias collisions occur more than once per day. 2. **Log based alerts** - `eventId=1002` with `reason="equal_rank"` - indicates precedence table gaps; page merge owners. - `eventId=1002` with `reason="mismatch"` - severity disagreement; open connector bug if sustained. 3. **Job health** - `stellaops-cli db merge` exit code `1` signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops. ### Threshold updates (2025-10-12) - `concelier.merge.conflicts` – Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging. - `concelier.merge.overrides` – Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with `primary_source=osv`, `suppressed_source=ghsa`). - `concelier.merge.range_overrides` – Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single `package_type=semver` override so ops can spot unexpected spikes. --- ## 4. Triage Workflow 1. **Confirm job context** - `stellaops-cli db merge` (CLI) or `POST /jobs/merge:reconcile` (API) to rehydrate the merge job. Use `--verbose` to stream structured logs during triage. 2. **Inspect metrics** - Correlate spikes in `concelier.merge.conflicts` with `primary_source`/`suppressed_source` tags from `concelier.merge.overrides`. 3. **Pull structured logs** - Example (vector output): ``` jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log ``` 4. **Review merge events** - `mongosh`: ```javascript use concelier; db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5); ``` - Compare `beforeHash` vs `afterHash` to confirm the merge actually changed canonical output. 5. **Interrogate provenance** - `db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })` - Check `provenance[].decisionReason` values (`precedence`, `freshness`, `tie-breaker`) to understand why the winning field was chosen. --- ## 5. Conflict Classification Matrix | Signal | Likely Cause | Immediate Action | |--------|--------------|------------------| | `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. | | `reason="primary_missing"` | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. | | `reason="equal_rank"` | Two feeds share the same precedence rank (custom config or missing entry). | Update `concelier:merge:precedence:ranks` to break the tie; restart merge job. | | Rising `concelier.merge.range_overrides` for a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit `decisionReason="precedence"` and update dashboards to treat registry ranges as fallback. | | `concelier.merge.identity_conflicts` > 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect `Alias collision` log payload; reconcile the alias graph by adjusting connector alias output. | --- ## 6. Resolution Playbook 1. **Connector data fix** - Re-run the offending connector stages (`stellaops-cli db fetch --source ghsa --stage map` etc.). - Once fixed, rerun merge and verify `decisionReason` reflects `freshness` or `precedence` as expected. 2. **Temporary precedence override** - Edit `etc/concelier.yaml`: ```yaml concelier: merge: precedence: ranks: osv: 1 ghsa: 0 ``` - Restart Concelier workers; confirm tags in `concelier.merge.overrides` show the new ranks. - Document the override with expiry in the change log. 3. **Alias remediation** - Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs). - Flush cached alias graphs if necessary (`db.alias_graph.drop()` is destructive-coordinate with Storage before issuing). 4. **Escalation** - If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and `merge_event` IDs. --- ## 7. Validation Checklist - [ ] Merge job rerun returns exit code `0`. - [ ] `concelier.merge.conflicts` baseline returns to zero after corrective action. - [ ] Latest `merge_event` entry shows expected hash delta. - [ ] Affected advisory document shows updated `provenance[].decisionReason`. - [ ] Ops change log updated with incident summary, config overrides, and rollback plan. --- ## 8. Reference Material - Canonical conflict rules: `src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md`. - Merge engine internals: `src/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs`. - Metrics definitions: `src/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs` (identity conflicts) and `AdvisoryPrecedenceMerger`. - Storage audit trail: `src/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs`, `src/StellaOps.Concelier.Storage.Mongo/MergeEvents`. Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change. --- ## 9. Synthetic Regression Fixtures - **Locations** – Canonical conflict snapshots now live at `src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json`, `src/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json`, and `src/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json`. - **Validation commands** – To regenerate and verify the fixtures offline, run: ```bash dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests dotnet test src/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests dotnet test src/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions ``` - **Expected signals** – The triple produces one freshness-driven summary override (`primary_source=osv`, `suppressed_source=ghsa`) and one range override for the npm SemVer package while leaving `concelier.merge.conflicts` at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines. --- ## 10. Change Log | Date (UTC) | Change | Notes | |------------|--------|-------| | 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |