5.1 KiB
5.1 KiB
Here’s a quick, practical idea to make your version-range modeling cleaner and faster to query.
Rethinking SemVerRangeBuilder + MongoDB
Problem (today): Version normalization rules live as a nested object (and often as a bespoke structure per source). This can force awkward $objectToArray, $map, and conditional logic in pipelines when you need to:
- match “is version X affected?”
- flatten ranges for analytics
- de-duplicate across sources
Proposal: Store normalized version rules as an embedded collection (array of small docs) instead of a single nested object.
Minimal background
- SemVer normalization: converting all source-specific version notations into a single, strict representation (e.g.,
>=1.2.3 <2.0.0, exact pins, wildcards). - Embedded collection: an array of consistently shaped items inside the parent doc—great for
$unwind-centric analytics and direct matches.
Suggested shape
{
"_id": "VULN-123",
"packageId": "pkg:npm/lodash",
"source": "NVD",
"normalizedVersions": [
{
"scheme": "semver",
"type": "range", // "range" | "exact" | "lt" | "lte" | "gt" | "gte"
"min": "1.2.3", // optional
"minInclusive": true, // optional
"max": "2.0.0", // optional
"maxInclusive": false, // optional
"notes": "from GHSA GHSA-xxxx" // traceability
},
{
"scheme": "semver",
"type": "exact",
"value": "1.5.0"
}
],
"metadata": { "ingestedAt": "2025-10-10T12:00:00Z" }
}
Why this helps
-
Simpler queries
-
Is v affected?
db.vulns.aggregate([ { $match: { packageId: "pkg:npm/lodash" } }, { $unwind: "$normalizedVersions" }, { $match: { $or: [ { "normalizedVersions.type": "exact", "normalizedVersions.value": "1.5.0" }, { "normalizedVersions.type": "range", "normalizedVersions.min": { $lte: "1.5.0" }, "normalizedVersions.max": { $gt: "1.5.0" } } ] }}, { $project: { _id: 1 } } ]) -
No
$objectToArray, fewer$conds.
-
-
Cheaper storage
- Arrays of tiny docs compress well and avoid wide nested structures with many nulls/keys.
-
Easier dedup/merge
$unwind→ normalize →$groupby{scheme,type,min,max,value}to collapse equivalent rules across sources.
Builder changes (SemVerRangeBuilder)
- Emit items, not a monolith: have the builder return
IEnumerable<NormalizedVersionRule>. - Normalize early: resolve “aliases” (
1.2.x,^1.2.3, distro styles) into canonical(type,min,max,…)before persistence. - Traceability: include
notes/sourceRefon each rule so you can re-materialize provenance during audits. - Lean projection helper: when you only need normalized rules (and not the intermediate primitives), prefer
SemVerRangeRuleBuilder.BuildNormalizedRules(rawRange, patchedVersion, provenanceNote)to skip manual projections.
C# sketch
public record NormalizedVersionRule(
string Scheme, // "semver"
string Type, // "range" | "exact" | ...
string? Min = null,
bool? MinInclusive = null,
string? Max = null,
bool? MaxInclusive = null,
string? Value = null,
string? Notes = null
);
public static class SemVerRangeBuilder
{
public static IEnumerable<NormalizedVersionRule> Build(string raw)
{
// parse raw (^1.2.3, 1.2.x, <=2.0.0, etc.)
// yield canonical rules:
yield return new NormalizedVersionRule(
Scheme: "semver",
Type: "range",
Min: "1.2.3",
MinInclusive: true,
Max: "2.0.0",
MaxInclusive: false,
Notes: "nvd:ABC-123"
);
}
}
Aggregation patterns you unlock
- Fast “affected version” lookups via
$unwind + $match(can complement with a computed sort key). - Rollups: count of vulns per
(major,minor)by mapping each rule into bucketed segments. - Cross-source reconciliation: group identical rules to de-duplicate.
Indexing tips
- Compound index on
{ packageId: 1, "normalizedVersions.scheme": 1, "normalizedVersions.type": 1 }. - If lookups by exact value are common: add a sparse index on
"normalizedVersions.value".
Migration path (safe + incremental)
- Dual-write: keep old nested object while writing the new
normalizedVersionsarray. - Backfill existing docs with a one-time script using your current builder.
- Cutover queries/aggregations to the new path (behind a feature flag).
- Clean up old field after soak.
If you want, I can draft:
- a one-time Mongo backfill script,
- the new EF/Mongo C# POCOs, and
- a test matrix (edge cases: prerelease tags, build metadata,
0.*semantics, distro-style ranges).