10 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			10 KiB
		
	
	
	
	
	
	
	
Concelier Conflict Resolution Runbook (Sprint 3)
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (AdvisoryPrecedenceMerger, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md and the metrics/logging instrumentation delivered this sprint.
1. Precedence Model (recap)
- Default ranking: GHSA -> NVD -> OSV, with distro/vendor PSIRTs outranking ecosystem feeds (AdvisoryPrecedenceDefaults). Useconcelier:merge:precedence:ranksto override per source when incident response requires it.
- Freshness override: if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps provenance[].decisionReason = freshness.
- Tie-breakers: when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set decisionReason = tie-breaker.
- Audit trail: each merged advisory receives a mergeprovenance entry listing the participating sources plus amerge_eventrecord with canonical before/after SHA-256 hashes.
2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose | 
|---|---|---|---|
| concelier.merge.operations | Counter | inputs | Total precedence merges executed. | 
| concelier.merge.overrides | Counter | primary_source,suppressed_source,primary_rank,suppressed_rank | Field-level overrides chosen by precedence. | 
| concelier.merge.range_overrides | Counter | advisory_key,package_type,primary_source,suppressed_source,primary_range_count,suppressed_range_count | Package range overrides emitted by AffectedPackagePrecedenceResolver. | 
| concelier.merge.conflicts | Counter | type(severity,precedence_tie),reason(mismatch,primary_missing,equal_rank) | Conflicts requiring operator review. | 
| concelier.merge.identity_conflicts | Counter | scheme,alias_value,advisory_count | Alias collisions surfaced by the identity graph. | 
Structured logs
- AdvisoryOverride(EventId 1000) - logs merge suppressions with alias/provenance counts.
- PackageRangeOverride(EventId 1001) - logs package-level precedence decisions.
- PrecedenceConflict(EventId 1002) - logs mismatched severity or equal-rank scenarios.
- Alias collision ...(no EventId) - emitted when- concelier.merge.identity_conflictsincrements.
Expect all logs at Information. Ensure OTEL exporters include the scope StellaOps.Concelier.Merge.
3. Detection & Alerting
- Dashboard panels
- concelier.merge.conflicts- table grouped by- type/reason. Alert when > 0 in a 15 minute window.
- concelier.merge.range_overrides- stacked bar by- package_type. Spikes highlight vendor PSIRT overrides over registry data.
- concelier.merge.overrideswith- primary_source|suppressed_source- catches unexpected precedence flips (e.g., OSV overtaking GHSA).
- concelier.merge.identity_conflicts- single-stat; alert when alias collisions occur more than once per day.
 
- Log based alerts
- eventId=1002with- reason="equal_rank"- indicates precedence table gaps; page merge owners.
- eventId=1002with- reason="mismatch"- severity disagreement; open connector bug if sustained.
 
- Job health
- stellaops-cli db mergeexit code- 1signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
 
Threshold updates (2025-10-12)
- concelier.merge.conflicts– Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
- concelier.merge.overrides– Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with- primary_source=osv,- suppressed_source=ghsa).
- concelier.merge.range_overrides– Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single- package_type=semveroverride so ops can spot unexpected spikes.
4. Triage Workflow
- Confirm job context
- stellaops-cli db merge(CLI) or- POST /jobs/merge:reconcile(API) to rehydrate the merge job. Use- --verboseto stream structured logs during triage.
 
- Inspect metrics
- Correlate spikes in concelier.merge.conflictswithprimary_source/suppressed_sourcetags fromconcelier.merge.overrides.
 
- Correlate spikes in 
- Pull structured logs
- Example (vector output):
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
 
- Example (vector output):
- Review merge events
- mongosh:- use concelier; db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
- Compare beforeHashvsafterHashto confirm the merge actually changed canonical output.
 
- Interrogate provenance
- db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })
- Check provenance[].decisionReasonvalues (precedence,freshness,tie-breaker) to understand why the winning field was chosen.
 
5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action | 
|---|---|---|
| reason="mismatch"withtype="severity" | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. | 
| reason="primary_missing" | Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. | 
| reason="equal_rank" | Two feeds share the same precedence rank (custom config or missing entry). | Update concelier:merge:precedence:ranksto break the tie; restart merge job. | 
| Rising concelier.merge.range_overridesfor a package type | Vendor PSIRT now supplies richer ranges. | Validate connectors emit decisionReason="precedence"and update dashboards to treat registry ranges as fallback. | 
| concelier.merge.identity_conflicts> 0 | Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect Alias collisionlog payload; reconcile the alias graph by adjusting connector alias output. | 
6. Resolution Playbook
- Connector data fix
- Re-run the offending connector stages (stellaops-cli db fetch --source ghsa --stage mapetc.).
- Once fixed, rerun merge and verify decisionReasonreflectsfreshnessorprecedenceas expected.
 
- Re-run the offending connector stages (
- Temporary precedence override
- Edit etc/concelier.yaml:concelier: merge: precedence: ranks: osv: 1 ghsa: 0
- Restart Concelier workers; confirm tags in concelier.merge.overridesshow the new ranks.
- Document the override with expiry in the change log.
 
- Edit 
- Alias remediation
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (db.alias_graph.drop()is destructive-coordinate with Storage before issuing).
 
- Escalation
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and merge_eventIDs.
 
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and 
7. Validation Checklist
- Merge job rerun returns exit code 0.
- concelier.merge.conflictsbaseline returns to zero after corrective action.
- Latest merge_evententry shows expected hash delta.
- Affected advisory document shows updated provenance[].decisionReason.
- Ops change log updated with incident summary, config overrides, and rollback plan.
8. Reference Material
- Canonical conflict rules: src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md.
- Merge engine internals: src/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs.
- Metrics definitions: src/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs(identity conflicts) andAdvisoryPrecedenceMerger.
- Storage audit trail: src/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs,src/StellaOps.Concelier.Storage.Mongo/MergeEvents.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
9. Synthetic Regression Fixtures
- Locations – Canonical conflict snapshots now live at src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json,src/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json, andsrc/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json.
- Validation commands – To regenerate and verify the fixtures offline, run:
dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
- Expected signals – The triple produces one freshness-driven summary override (primary_source=osv,suppressed_source=ghsa) and one range override for the npm SemVer package while leavingconcelier.merge.conflictsat zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
10. Change Log
| Date (UTC) | Change | Notes | 
|---|---|---|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |