Here’s a quick, practical idea to make your version-range modeling cleaner and faster to query. ![A simple diagram showing a Vulnerability doc with an embedded normalizedVersions array next to a pipeline icon labeled “simpler aggregations”.](https://images.unsplash.com/photo-1515879218367-8466d910aaa4?q=80\&w=1470\&auto=format\&fit=crop) # Rethinking `SemVerRangeBuilder` + MongoDB **Problem (today):** Version normalization rules live as a nested object (and often as a bespoke structure per source). This can force awkward `$objectToArray`, `$map`, and conditional logic in pipelines when you need to: * match “is version X affected?” * flatten ranges for analytics * de-duplicate across sources **Proposal:** Store *normalized version rules as an embedded collection (array of small docs)* instead of a single nested object. ## Minimal background * **SemVer normalization**: converting all source-specific version notations into a single, strict representation (e.g., `>=1.2.3 <2.0.0`, exact pins, wildcards). * **Embedded collection**: an array of consistently shaped items inside the parent doc—great for `$unwind`-centric analytics and direct matches. ## Suggested shape ```json { "_id": "VULN-123", "packageId": "pkg:npm/lodash", "source": "NVD", "normalizedVersions": [ { "scheme": "semver", "type": "range", // "range" | "exact" | "lt" | "lte" | "gt" | "gte" "min": "1.2.3", // optional "minInclusive": true, // optional "max": "2.0.0", // optional "maxInclusive": false, // optional "notes": "from GHSA GHSA-xxxx" // traceability }, { "scheme": "semver", "type": "exact", "value": "1.5.0" } ], "metadata": { "ingestedAt": "2025-10-10T12:00:00Z" } } ``` ### Why this helps * **Simpler queries** * *Is v affected?* ```js db.vulns.aggregate([ { $match: { packageId: "pkg:npm/lodash" } }, { $unwind: "$normalizedVersions" }, { $match: { $or: [ { "normalizedVersions.type": "exact", "normalizedVersions.value": "1.5.0" }, { "normalizedVersions.type": "range", "normalizedVersions.min": { $lte: "1.5.0" }, "normalizedVersions.max": { $gt: "1.5.0" } } ] }}, { $project: { _id: 1 } } ]) ``` * No `$objectToArray`, fewer `$cond`s. * **Cheaper storage** * Arrays of tiny docs compress well and avoid wide nested structures with many nulls/keys. * **Easier dedup/merge** * `$unwind` → normalize → `$group` by `{scheme,type,min,max,value}` to collapse equivalent rules across sources. ## Builder changes (`SemVerRangeBuilder`) * **Emit items, not a monolith**: have the builder return `IEnumerable`. * **Normalize early**: resolve “aliases” (`1.2.x`, `^1.2.3`, distro styles) into canonical `(type,min,max,…)` before persistence. * **Traceability**: include `notes`/`sourceRef` on each rule so you can re-materialize provenance during audits. * **Lean projection helper**: when you only need normalized rules (and not the intermediate primitives), prefer `SemVerRangeRuleBuilder.BuildNormalizedRules(rawRange, patchedVersion, provenanceNote)` to skip manual projections. ### C# sketch ```csharp public record NormalizedVersionRule( string Scheme, // "semver" string Type, // "range" | "exact" | ... string? Min = null, bool? MinInclusive = null, string? Max = null, bool? MaxInclusive = null, string? Value = null, string? Notes = null ); public static class SemVerRangeBuilder { public static IEnumerable Build(string raw) { // parse raw (^1.2.3, 1.2.x, <=2.0.0, etc.) // yield canonical rules: yield return new NormalizedVersionRule( Scheme: "semver", Type: "range", Min: "1.2.3", MinInclusive: true, Max: "2.0.0", MaxInclusive: false, Notes: "nvd:ABC-123" ); } } ``` ## Aggregation patterns you unlock * **Fast “affected version” lookups** via `$unwind + $match` (can complement with a computed sort key). * **Rollups**: count of vulns per `(major,minor)` by mapping each rule into bucketed segments. * **Cross-source reconciliation**: group identical rules to de-duplicate. ## Indexing tips * Compound index on `{ packageId: 1, "normalizedVersions.scheme": 1, "normalizedVersions.type": 1 }`. * If lookups by exact value are common: add a sparse index on `"normalizedVersions.value"`. ## Migration path (safe + incremental) 1. **Dual-write**: keep old nested object while writing the new `normalizedVersions` array. 2. **Backfill** existing docs with a one-time script using your current builder. 3. **Cutover** queries/aggregations to the new path (behind a feature flag). 4. **Clean up** old field after soak. If you want, I can draft: * a one-time Mongo backfill script, * the new EF/Mongo C# POCOs, and * a test matrix (edge cases: prerelease tags, build metadata, `0.*` semantics, distro-style ranges).