141 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			141 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| Here’s a quick, practical idea to make your version-range modeling cleaner and faster to query.
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| # Rethinking `SemVerRangeBuilder` + MongoDB
 | ||
| 
 | ||
| **Problem (today):** Version normalization rules live as a nested object (and often as a bespoke structure per source). This can force awkward `$objectToArray`, `$map`, and conditional logic in pipelines when you need to:
 | ||
| 
 | ||
| * match “is version X affected?”
 | ||
| * flatten ranges for analytics
 | ||
| * de-duplicate across sources
 | ||
| 
 | ||
| **Proposal:** Store *normalized version rules as an embedded collection (array of small docs)* instead of a single nested object.
 | ||
| 
 | ||
| ## Minimal background
 | ||
| 
 | ||
| * **SemVer normalization**: converting all source-specific version notations into a single, strict representation (e.g., `>=1.2.3 <2.0.0`, exact pins, wildcards).
 | ||
| * **Embedded collection**: an array of consistently shaped items inside the parent doc—great for `$unwind`-centric analytics and direct matches.
 | ||
| 
 | ||
| ## Suggested shape
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "_id": "VULN-123",
 | ||
|   "packageId": "pkg:npm/lodash",
 | ||
|   "source": "NVD",
 | ||
|   "normalizedVersions": [
 | ||
|     {
 | ||
|       "scheme": "semver",
 | ||
|       "type": "range",                 // "range" | "exact" | "lt" | "lte" | "gt" | "gte"
 | ||
|       "min": "1.2.3",                  // optional
 | ||
|       "minInclusive": true,            // optional
 | ||
|       "max": "2.0.0",                  // optional
 | ||
|       "maxInclusive": false,           // optional
 | ||
|       "notes": "from GHSA GHSA-xxxx"   // traceability
 | ||
|     },
 | ||
|     {
 | ||
|       "scheme": "semver",
 | ||
|       "type": "exact",
 | ||
|       "value": "1.5.0"
 | ||
|     }
 | ||
|   ],
 | ||
|   "metadata": { "ingestedAt": "2025-10-10T12:00:00Z" }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ### Why this helps
 | ||
| 
 | ||
| * **Simpler queries**
 | ||
| 
 | ||
|   * *Is v affected?*
 | ||
| 
 | ||
|     ```js
 | ||
|     db.vulns.aggregate([
 | ||
|       { $match: { packageId: "pkg:npm/lodash" } },
 | ||
|       { $unwind: "$normalizedVersions" },
 | ||
|       { $match: {
 | ||
|           $or: [
 | ||
|             { "normalizedVersions.type": "exact", "normalizedVersions.value": "1.5.0" },
 | ||
|             { "normalizedVersions.type": "range",
 | ||
|               "normalizedVersions.min": { $lte: "1.5.0" },
 | ||
|               "normalizedVersions.max": { $gt:  "1.5.0" } }
 | ||
|           ]
 | ||
|       }},
 | ||
|       { $project: { _id: 1 } }
 | ||
|     ])
 | ||
|     ```
 | ||
|   * No `$objectToArray`, fewer `$cond`s.
 | ||
| 
 | ||
| * **Cheaper storage**
 | ||
| 
 | ||
|   * Arrays of tiny docs compress well and avoid wide nested structures with many nulls/keys.
 | ||
| 
 | ||
| * **Easier dedup/merge**
 | ||
| 
 | ||
|   * `$unwind` → normalize → `$group` by `{scheme,type,min,max,value}` to collapse equivalent rules across sources.
 | ||
| 
 | ||
| ## Builder changes (`SemVerRangeBuilder`)
 | ||
| 
 | ||
| * **Emit items, not a monolith**: have the builder return `IEnumerable<NormalizedVersionRule>`.
 | ||
| * **Normalize early**: resolve “aliases” (`1.2.x`, `^1.2.3`, distro styles) into canonical `(type,min,max,…)` before persistence.
 | ||
| * **Traceability**: include `notes`/`sourceRef` on each rule so you can re-materialize provenance during audits.
 | ||
| * **Lean projection helper**: when you only need normalized rules (and not the intermediate primitives), prefer `SemVerRangeRuleBuilder.BuildNormalizedRules(rawRange, patchedVersion, provenanceNote)` to skip manual projections.
 | ||
| 
 | ||
| ### C# sketch
 | ||
| 
 | ||
| ```csharp
 | ||
| public record NormalizedVersionRule(
 | ||
|     string Scheme,           // "semver"
 | ||
|     string Type,             // "range" | "exact" | ...
 | ||
|     string? Min = null,
 | ||
|     bool? MinInclusive = null,
 | ||
|     string? Max = null,
 | ||
|     bool? MaxInclusive = null,
 | ||
|     string? Value = null,
 | ||
|     string? Notes = null
 | ||
| );
 | ||
| 
 | ||
| public static class SemVerRangeBuilder
 | ||
| {
 | ||
|     public static IEnumerable<NormalizedVersionRule> Build(string raw)
 | ||
|     {
 | ||
|         // parse raw (^1.2.3, 1.2.x, <=2.0.0, etc.)
 | ||
|         // yield canonical rules:
 | ||
|         yield return new NormalizedVersionRule(
 | ||
|             Scheme: "semver",
 | ||
|             Type: "range",
 | ||
|             Min: "1.2.3",
 | ||
|             MinInclusive: true,
 | ||
|             Max: "2.0.0",
 | ||
|             MaxInclusive: false,
 | ||
|             Notes: "nvd:ABC-123"
 | ||
|         );
 | ||
|     }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ## Aggregation patterns you unlock
 | ||
| 
 | ||
| * **Fast “affected version” lookups** via `$unwind + $match` (can complement with a computed sort key).
 | ||
| * **Rollups**: count of vulns per `(major,minor)` by mapping each rule into bucketed segments.
 | ||
| * **Cross-source reconciliation**: group identical rules to de-duplicate.
 | ||
| 
 | ||
| ## Indexing tips
 | ||
| 
 | ||
| * Compound index on `{ packageId: 1, "normalizedVersions.scheme": 1, "normalizedVersions.type": 1 }`.
 | ||
| * If lookups by exact value are common: add a sparse index on `"normalizedVersions.value"`.
 | ||
| 
 | ||
| ## Migration path (safe + incremental)
 | ||
| 
 | ||
| 1. **Dual-write**: keep old nested object while writing the new `normalizedVersions` array.
 | ||
| 2. **Backfill** existing docs with a one-time script using your current builder.
 | ||
| 3. **Cutover** queries/aggregations to the new path (behind a feature flag).
 | ||
| 4. **Clean up** old field after soak.
 | ||
| 
 | ||
| If you want, I can draft:
 | ||
| 
 | ||
| * a one-time Mongo backfill script,
 | ||
| * the new EF/Mongo C# POCOs, and
 | ||
| * a test matrix (edge cases: prerelease tags, build metadata, `0.*` semantics, distro-style ranges).
 |