git.stella-ops.org/docs/dev/mongo_indices.md

# Normalized Versions Query Guide

This guide complements the Sprint 1–2 normalized versions rollout. It documents recommended indexes and aggregation patterns for querying `AffectedPackage.normalizedVersions`.

For a field-by-field look at how normalized rules persist in MongoDB (including provenance metadata), see Section 8 of the [Feedser SemVer Merge Playbook](merge_semver_playbook.md).

## 1. Recommended indexes

When `feedser.storage.enableSemVerStyle` is enabled, advisories expose a flattened
`normalizedVersions` array at the document root. Create these indexes in `mongosh`
after the migration completes (adjust collection name if you use a prefix):

```javascript
db.advisories.createIndex(
  {
    "normalizedVersions.packageId": 1,
    "normalizedVersions.scheme": 1,
    "normalizedVersions.type": 1
  },
  { name: "advisory_normalizedVersions_pkg_scheme_type" }
);

db.advisories.createIndex(
  { "normalizedVersions.value": 1 },
  { name: "advisory_normalizedVersions_value", sparse: true }
);
```

- The compound index accelerates `$match` stages that filter by package identifier and rule style without unwinding `affectedPackages`.
- The sparse index keeps storage costs low while supporting pure exact-version lookups (type `exact`).

The storage bootstrapper creates the same indexes automatically when the feature flag is enabled.

## 2. Query patterns

### 2.1 Determine if a specific version is affected

```javascript
db.advisories.aggregate([
  { $match: { "normalizedVersions.packageId": "pkg:npm/lodash" } },
  { $unwind: "$normalizedVersions" },
  { $match: {
      $or: [
        { "normalizedVersions.type": "exact",
          "normalizedVersions.value": "4.17.21" },
        { "normalizedVersions.type": "range",
          "normalizedVersions.min": { $lte: "4.17.21" },
          "normalizedVersions.max": { $gt: "4.17.21" } },
        { "normalizedVersions.type": "gte",
          "normalizedVersions.min": { $lte: "4.17.21" } },
        { "normalizedVersions.type": "lte",
          "normalizedVersions.max": { $gte: "4.17.21" } }
      ]
  }},
  { $project: { advisoryKey: 1, title: 1, "normalizedVersions.packageId": 1 } }
]);
```

Use this pipeline during Sprint 2 staging validation runs. Invoke `explain("executionStats")` to confirm the compound index is selected.

### 2.2 Locate advisories missing normalized rules

```javascript
db.advisories.aggregate([
  { $match: { $or: [
      { "normalizedVersions": { $exists: false } },
      { "normalizedVersions": { $size: 0 } }
    ] } },
  { $project: { advisoryKey: 1, affectedPackages: 1 } }
]);
```

Run this query after backfill jobs to identify gaps that still rely solely on `rangeExpression`.

### 2.3 Deduplicate overlapping rules

```javascript
db.advisories.aggregate([
  { $unwind: "$normalizedVersions" },
  { $group: {
      _id: {
        identifier: "$normalizedVersions.packageId",
        scheme: "$normalizedVersions.scheme",
        type: "$normalizedVersions.type",
        min: "$normalizedVersions.min",
        minInclusive: "$normalizedVersions.minInclusive",
        max: "$normalizedVersions.max",
        maxInclusive: "$normalizedVersions.maxInclusive",
        value: "$normalizedVersions.value"
      },
      advisories: { $addToSet: "$advisoryKey" },
      notes: { $addToSet: "$normalizedVersions.notes" }
  }},
  { $match: { "advisories.1": { $exists: true } } },
  { $sort: { "_id.identifier": 1, "_id.type": 1 } }
]);
```

Use this to confirm the merge dedupe logic keeps only one normalized rule per unique constraint.

## 3. Operational checklist

- [ ] Create the indexes in staging before toggling dual-write in production.
- [ ] Capture explain plans and attach them to the release notes.
- [ ] Notify downstream services that consume advisory snapshots about the new `normalizedVersions` array.
- [ ] Update export fixtures once dedupe verification passes.

Additional background and mapper examples live in [Feedser SemVer Merge Playbook](merge_semver_playbook.md).