10 KiB
10 KiB
Concelier Conflict Resolution Runbook (Sprint 3)
This runbook equips Concelier operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (AdvisoryPrecedenceMerger, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md and the metrics/logging instrumentation delivered this sprint.
1. Precedence Model (recap)
- Default ranking:
GHSA -> NVD -> OSV, with distro/vendor PSIRTs outranking ecosystem feeds (AdvisoryPrecedenceDefaults). Useconcelier:merge:precedence:ranksto override per source when incident response requires it. - Freshness override: if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps
provenance[].decisionReason = freshness. - Tie-breakers: when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set
decisionReason = tie-breaker. - Audit trail: each merged advisory receives a
mergeprovenance entry listing the participating sources plus amerge_eventrecord with canonical before/after SHA-256 hashes.
2. Telemetry Shipped This Sprint
| Instrument | Type | Key Tags | Purpose |
|---|---|---|---|
concelier.merge.operations |
Counter | inputs |
Total precedence merges executed. |
concelier.merge.overrides |
Counter | primary_source, suppressed_source, primary_rank, suppressed_rank |
Field-level overrides chosen by precedence. |
concelier.merge.range_overrides |
Counter | advisory_key, package_type, primary_source, suppressed_source, primary_range_count, suppressed_range_count |
Package range overrides emitted by AffectedPackagePrecedenceResolver. |
concelier.merge.conflicts |
Counter | type (severity, precedence_tie), reason (mismatch, primary_missing, equal_rank) |
Conflicts requiring operator review. |
concelier.merge.identity_conflicts |
Counter | scheme, alias_value, advisory_count |
Alias collisions surfaced by the identity graph. |
Structured logs
AdvisoryOverride(EventId 1000) - logs merge suppressions with alias/provenance counts.PackageRangeOverride(EventId 1001) - logs package-level precedence decisions.PrecedenceConflict(EventId 1002) - logs mismatched severity or equal-rank scenarios.Alias collision ...(no EventId) - emitted whenconcelier.merge.identity_conflictsincrements.
Expect all logs at Information. Ensure OTEL exporters include the scope StellaOps.Concelier.Merge.
3. Detection & Alerting
- Dashboard panels
concelier.merge.conflicts- table grouped bytype/reason. Alert when > 0 in a 15 minute window.concelier.merge.range_overrides- stacked bar bypackage_type. Spikes highlight vendor PSIRT overrides over registry data.concelier.merge.overrideswithprimary_source|suppressed_source- catches unexpected precedence flips (e.g., OSV overtaking GHSA).concelier.merge.identity_conflicts- single-stat; alert when alias collisions occur more than once per day.
- Log based alerts
eventId=1002withreason="equal_rank"- indicates precedence table gaps; page merge owners.eventId=1002withreason="mismatch"- severity disagreement; open connector bug if sustained.
- Job health
stellaops-cli db mergeexit code1signifies unresolved conflicts. Pipe to automation that captures logs and notifies #concelier-ops.
Threshold updates (2025-10-12)
concelier.merge.conflicts– Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.concelier.merge.overrides– Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override withprimary_source=osv,suppressed_source=ghsa).concelier.merge.range_overrides– Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a singlepackage_type=semveroverride so ops can spot unexpected spikes.
4. Triage Workflow
- Confirm job context
stellaops-cli db merge(CLI) orPOST /jobs/merge:reconcile(API) to rehydrate the merge job. Use--verboseto stream structured logs during triage.
- Inspect metrics
- Correlate spikes in
concelier.merge.conflictswithprimary_source/suppressed_sourcetags fromconcelier.merge.overrides.
- Correlate spikes in
- Pull structured logs
- Example (vector output):
jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-concelier.log
- Example (vector output):
- Review merge events
mongosh:use concelier; db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);- Compare
beforeHashvsafterHashto confirm the merge actually changed canonical output.
- Interrogate provenance
db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })- Check
provenance[].decisionReasonvalues (precedence,freshness,tie-breaker) to understand why the winning field was chosen.
5. Conflict Classification Matrix
| Signal | Likely Cause | Immediate Action |
|---|---|---|
reason="mismatch" with type="severity" |
Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
reason="primary_missing" |
Higher-ranked source lacks the field entirely. | Backfill connector data or temporarily allow lower-ranked source via precedence override. |
reason="equal_rank" |
Two feeds share the same precedence rank (custom config or missing entry). | Update concelier:merge:precedence:ranks to break the tie; restart merge job. |
Rising concelier.merge.range_overrides for a package type |
Vendor PSIRT now supplies richer ranges. | Validate connectors emit decisionReason="precedence" and update dashboards to treat registry ranges as fallback. |
concelier.merge.identity_conflicts > 0 |
Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). | Inspect Alias collision log payload; reconcile the alias graph by adjusting connector alias output. |
6. Resolution Playbook
- Connector data fix
- Re-run the offending connector stages (
stellaops-cli db fetch --source ghsa --stage mapetc.). - Once fixed, rerun merge and verify
decisionReasonreflectsfreshnessorprecedenceas expected.
- Re-run the offending connector stages (
- Temporary precedence override
- Edit
etc/concelier.yaml:concelier: merge: precedence: ranks: osv: 1 ghsa: 0 - Restart Concelier workers; confirm tags in
concelier.merge.overridesshow the new ranks. - Document the override with expiry in the change log.
- Edit
- Alias remediation
- Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
- Flush cached alias graphs if necessary (
db.alias_graph.drop()is destructive-coordinate with Storage before issuing).
- Escalation
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and
merge_eventIDs.
- If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and
7. Validation Checklist
- Merge job rerun returns exit code
0. concelier.merge.conflictsbaseline returns to zero after corrective action.- Latest
merge_evententry shows expected hash delta. - Affected advisory document shows updated
provenance[].decisionReason. - Ops change log updated with incident summary, config overrides, and rollback plan.
8. Reference Material
- Canonical conflict rules:
src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md. - Merge engine internals:
src/StellaOps.Concelier.Merge/Services/AdvisoryPrecedenceMerger.cs. - Metrics definitions:
src/StellaOps.Concelier.Merge/Services/AdvisoryMergeService.cs(identity conflicts) andAdvisoryPrecedenceMerger. - Storage audit trail:
src/StellaOps.Concelier.Merge/Services/MergeEventWriter.cs,src/StellaOps.Concelier.Storage.Mongo/MergeEvents.
Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.
9. Synthetic Regression Fixtures
- Locations – Canonical conflict snapshots now live at
src/StellaOps.Concelier.Connector.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json,src/StellaOps.Concelier.Connector.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json, andsrc/StellaOps.Concelier.Connector.Osv.Tests/Fixtures/conflict-osv.canonical.json. - Validation commands – To regenerate and verify the fixtures offline, run:
dotnet test src/StellaOps.Concelier.Connector.Ghsa.Tests/StellaOps.Concelier.Connector.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Nvd.Tests/StellaOps.Concelier.Connector.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Concelier.Merge.Tests/StellaOps.Concelier.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
- Expected signals – The triple produces one freshness-driven summary override (
primary_source=osv,suppressed_source=ghsa) and one range override for the npm SemVer package while leavingconcelier.merge.conflictsat zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.
10. Change Log
| Date (UTC) | Change | Notes |
|---|---|---|
| 2025-10-16 | Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. | Ops sign-off recorded by Concelier Ops Guild; no additional overrides required. |