Files
git.stella-ops.org/docs/ops/feedser-conflict-resolution.md
Vladimir Moushkov ea1106ce7c up
2025-10-15 10:03:56 +03:00

10 KiB
Raw Blame History

Feedser Conflict Resolution Runbook (Sprint 3)

This runbook equips Feedser operators to detect, triage, and resolve advisory conflicts now that the Sprint 3 merge engine landed (AdvisoryPrecedenceMerger, merge-event hashing, and telemetry counters). It builds on the canonical rules defined in src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md and the metrics/logging instrumentation delivered this sprint.


1. Precedence Model (recap)

  • Default ranking: GHSA -> NVD -> OSV, with distro/vendor PSIRTs outranking ecosystem feeds (AdvisoryPrecedenceDefaults). Use feedser:merge:precedence:ranks to override per source when incident response requires it.
  • Freshness override: if a lower-ranked source is >= 48 hours newer for a freshness-sensitive field (title, summary, affected ranges, references, credits), it wins. Every override stamps provenance[].decisionReason = freshness.
  • Tie-breakers: when precedence and freshness tie, the engine falls back to (1) primary source order, (2) shortest normalized text, (3) lowest stable hash. Merge-generated provenance records set decisionReason = tie-breaker.
  • Audit trail: each merged advisory receives a merge provenance entry listing the participating sources plus a merge_event record with canonical before/after SHA-256 hashes.

2. Telemetry Shipped This Sprint

Instrument Type Key Tags Purpose
feedser.merge.operations Counter inputs Total precedence merges executed.
feedser.merge.overrides Counter primary_source, suppressed_source, primary_rank, suppressed_rank Field-level overrides chosen by precedence.
feedser.merge.range_overrides Counter advisory_key, package_type, primary_source, suppressed_source, primary_range_count, suppressed_range_count Package range overrides emitted by AffectedPackagePrecedenceResolver.
feedser.merge.conflicts Counter type (severity, precedence_tie), reason (mismatch, primary_missing, equal_rank) Conflicts requiring operator review.
feedser.merge.identity_conflicts Counter scheme, alias_value, advisory_count Alias collisions surfaced by the identity graph.

Structured logs

  • AdvisoryOverride (EventId 1000) - logs merge suppressions with alias/provenance counts.
  • PackageRangeOverride (EventId 1001) - logs package-level precedence decisions.
  • PrecedenceConflict (EventId 1002) - logs mismatched severity or equal-rank scenarios.
  • Alias collision ... (no EventId) - emitted when feedser.merge.identity_conflicts increments.

Expect all logs at Information. Ensure OTEL exporters include the scope StellaOps.Feedser.Merge.


3. Detection & Alerting

  1. Dashboard panels
    • feedser.merge.conflicts - table grouped by type/reason. Alert when > 0 in a 15 minute window.
    • feedser.merge.range_overrides - stacked bar by package_type. Spikes highlight vendor PSIRT overrides over registry data.
    • feedser.merge.overrides with primary_source|suppressed_source - catches unexpected precedence flips (e.g., OSV overtaking GHSA).
    • feedser.merge.identity_conflicts - single-stat; alert when alias collisions occur more than once per day.
  2. Log based alerts
    • eventId=1002 with reason="equal_rank" - indicates precedence table gaps; page merge owners.
    • eventId=1002 with reason="mismatch" - severity disagreement; open connector bug if sustained.
  3. Job health
    • stellaops-cli db merge exit code 1 signifies unresolved conflicts. Pipe to automation that captures logs and notifies #feedser-ops.

Threshold updates (2025-10-12)

  • feedser.merge.conflicts Page only when ≥ 2 events fire within 30 minutes; the synthetic conflict fixture run produces 0 conflicts, so the first event now routes to Slack for manual review instead of paging.
  • feedser.merge.overrides Raise a warning when the 30-minute sum exceeds 10 (canonical triple yields exactly 1 summary override with primary_source=osv, suppressed_source=ghsa).
  • feedser.merge.range_overrides Maintain the 15-minute alert at ≥ 3 but annotate dashboards that the regression triple emits a single package_type=semver override so ops can spot unexpected spikes.

4. Triage Workflow

  1. Confirm job context
    • stellaops-cli db merge (CLI) or POST /jobs/merge:reconcile (API) to rehydrate the merge job. Use --verbose to stream structured logs during triage.
  2. Inspect metrics
    • Correlate spikes in feedser.merge.conflicts with primary_source/suppressed_source tags from feedser.merge.overrides.
  3. Pull structured logs
    • Example (vector output):
      jq 'select(.EventId.Name=="PrecedenceConflict") | {advisory: .State[0].Value, type: .ConflictType, reason: .Reason, primary: .PrimarySources, suppressed: .SuppressedSources}' stellaops-feedser.log
      
  4. Review merge events
    • mongosh:
      use feedser;
      db.merge_event.find({ advisoryKey: "CVE-2025-1234" }).sort({ mergedAt: -1 }).limit(5);
      
    • Compare beforeHash vs afterHash to confirm the merge actually changed canonical output.
  5. Interrogate provenance
    • db.advisories.findOne({ advisoryKey: "CVE-2025-1234" }, { title: 1, severity: 1, provenance: 1, "affectedPackages.provenance": 1 })
    • Check provenance[].decisionReason values (precedence, freshness, tie-breaker) to understand why the winning field was chosen.

5. Conflict Classification Matrix

Signal Likely Cause Immediate Action
reason="mismatch" with type="severity" Upstream feeds disagree on CVSS vector/severity. Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override.
reason="primary_missing" Higher-ranked source lacks the field entirely. Backfill connector data or temporarily allow lower-ranked source via precedence override.
reason="equal_rank" Two feeds share the same precedence rank (custom config or missing entry). Update feedser:merge:precedence:ranks to break the tie; restart merge job.
Rising feedser.merge.range_overrides for a package type Vendor PSIRT now supplies richer ranges. Validate connectors emit decisionReason="precedence" and update dashboards to treat registry ranges as fallback.
feedser.merge.identity_conflicts > 0 Alias scheme mapping produced collisions (duplicate CVE <-> advisory pairs). Inspect Alias collision log payload; reconcile the alias graph by adjusting connector alias output.

6. Resolution Playbook

  1. Connector data fix
    • Re-run the offending connector stages (stellaops-cli db fetch --source ghsa --stage map etc.).
    • Once fixed, rerun merge and verify decisionReason reflects freshness or precedence as expected.
  2. Temporary precedence override
    • Edit etc/feedser.yaml:
      feedser:
        merge:
          precedence:
            ranks:
              osv: 1
              ghsa: 0
      
    • Restart Feedser workers; confirm tags in feedser.merge.overrides show the new ranks.
    • Document the override with expiry in the change log.
  3. Alias remediation
    • Update connector mapping rules to weed out duplicate aliases (e.g., skip GHSA aliases that mirror CVE IDs).
    • Flush cached alias graphs if necessary (db.alias_graph.drop() is destructive-coordinate with Storage before issuing).
  4. Escalation
    • If override metrics spike due to upstream regression, open an incident with Security Guild, referencing merge logs and merge_event IDs.

7. Validation Checklist

  • Merge job rerun returns exit code 0.
  • feedser.merge.conflicts baseline returns to zero after corrective action.
  • Latest merge_event entry shows expected hash delta.
  • Affected advisory document shows updated provenance[].decisionReason.
  • Ops change log updated with incident summary, config overrides, and rollback plan.

8. Reference Material

  • Canonical conflict rules: src/DEDUP_CONFLICTS_RESOLUTION_ALGO.md.
  • Merge engine internals: src/StellaOps.Feedser.Merge/Services/AdvisoryPrecedenceMerger.cs.
  • Metrics definitions: src/StellaOps.Feedser.Merge/Services/AdvisoryMergeService.cs (identity conflicts) and AdvisoryPrecedenceMerger.
  • Storage audit trail: src/StellaOps.Feedser.Merge/Services/MergeEventWriter.cs, src/StellaOps.Feedser.Storage.Mongo/MergeEvents.

Keep this runbook synchronized with future sprint notes and update alert thresholds as baseline volumes change.


9. Synthetic Regression Fixtures

  • Locations Canonical conflict snapshots now live at src/StellaOps.Feedser.Source.Ghsa.Tests/Fixtures/conflict-ghsa.canonical.json, src/StellaOps.Feedser.Source.Nvd.Tests/Nvd/Fixtures/conflict-nvd.canonical.json, and src/StellaOps.Feedser.Source.Osv.Tests/Fixtures/conflict-osv.canonical.json.
  • Validation commands To regenerate and verify the fixtures offline, run:
dotnet test src/StellaOps.Feedser.Source.Ghsa.Tests/StellaOps.Feedser.Source.Ghsa.Tests.csproj --filter GhsaConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Nvd.Tests/StellaOps.Feedser.Source.Nvd.Tests.csproj --filter NvdConflictFixtureTests
dotnet test src/StellaOps.Feedser.Source.Osv.Tests/StellaOps.Feedser.Source.Osv.Tests.csproj --filter OsvConflictFixtureTests
dotnet test src/StellaOps.Feedser.Merge.Tests/StellaOps.Feedser.Merge.Tests.csproj --filter MergeAsync_AppliesCanonicalRulesAndPersistsDecisions
  • Expected signals The triple produces one freshness-driven summary override (primary_source=osv, suppressed_source=ghsa) and one range override for the npm SemVer package while leaving feedser.merge.conflicts at zero. Use these values as the baseline when tuning dashboards or load-testing alert pipelines.

10. Change Log

Date (UTC) Change Notes
2025-10-16 Ops review signed off after connector expansion (CCCS, CERT-Bund, KISA, ICS CISA, MSRC) landed. Alert thresholds from §3 reaffirmed; dashboards updated to watch attachment signals emitted by ICS CISA connector. Ops sign-off recorded by Feedser Ops Guild; no additional overrides required.