# Node Analyzer (npm/Yarn/pnpm) This document captures the Node language analyzer’s deterministic behavior guarantees and safety constraints (what it emits, what it refuses to emit, and how it stays bounded/offline). ## Component identity & precedence ### Installed vs declared-only - The analyzer always emits **on-disk inventory** first (workspace member manifests + installed `node_modules`/PNPM/Yarn PnP cache packages). - It then emits **declared-only** components for lockfile / manifest declarations that are **not backed by on-disk inventory**: - If a declared entry has a **concrete resolved version** from a lockfile, it emits a versioned `pkg:npm/...@` PURL. - If the version is **non-concrete** (ranges/tags/git/file/workspace/link/path), it emits an **explicit-key** component (`purl=null`, `version=null`). ### Identity safety (PURL vs explicit-key) - Concrete PURLs are emitted only when the analyzer can prove a **concrete version** from local evidence (installed `package.json` or a lockfile-resolved entry). - Declared-only/non-concrete dependencies use `LanguageExplicitKey` (see `docs/modules/scanner/language-analyzers-contract.md`). ### Lock metadata lookup precedence When attaching lock metadata to an installed package: 1) `package-lock.json` path match (`packages[""]`), 2) `(name, version)` match (Yarn/pnpm multi-version support), 3) fallback to name-only (last-wins) for legacy locks. ## Lockfile parsing guarantees (offline) ### `package-lock.json` (npm) - Supports v3+ `packages{}` layout and legacy `dependencies{}` traversal. - Correctly extracts nested names from `node_modules/.../node_modules/...` paths (including scoped packages). ### `yarn.lock` (Yarn v1 + Berry v2/v3) - Supports both Yarn v1 (`resolved "https://..."`) and Berry fields (`resolution:`, `checksum:`). - If `integrity` is absent but `checksum` is present, the analyzer records integrity-like evidence as `checksum:`. - Ignores the `__metadata` section. ### `pnpm-lock.yaml` (pnpm) - Parses modern `packages:` and `snapshots:` sections. - Does not drop entries that lack `integrity` (workspace/link/file/git); instead it emits: - `lockIntegrityMissing=true` - `lockIntegrityMissingReason=` ## Workspaces - Reads workspace members from the root `package.json` (`workspaces` array or `{ packages: [...] }` form). - Supports glob patterns: - `*` (single segment) - `**` (multi-segment) - Expansion is bounded and deterministic: - Skips `node_modules` - Caps traversal depth and total visited directories/members - Stable, sorted member output - Dependency scopes (`production|development|peer|optional`) are derived from both the root and workspace manifests, with deterministic precedence. ## Import scanning (bounded) - Import scanning runs only for the root package and workspace member packages (not `node_modules` packages). - File types: `.js/.jsx/.mjs/.cjs/.ts/.tsx/.mts/.cts`. - Parser behavior: - Attempts AST parsing as script/module; falls back to a bounded regex heuristic for TS when parsing fails. - Hard caps per package: - `maxFiles=500`, `maxBytes=5MiB`, `maxFileBytes=512KiB`, `maxDepth=20` - Skips `node_modules` and `.pnpm` directories during traversal - If capped, the analyzer marks the package metadata with: - `importScanSkipped=true` - `importScan.filesScanned=` - `importScan.bytesScanned=` ## Container layer layouts - Candidate layer roots under the analysis root: - `layers/*`, `.layers/*`, `layer*` - Each candidate root is scanned independently. - The analyzer also discovers `package.json` roots nested under layer roots (bounded depth) and includes their nested `node_modules` roots when present. ## Determinism & evidence hashing - On-disk `package.json` manifests are hashed (sha256) when ≤ 1 MiB and attached to the root evidence for deterministic provenance. - Output ordering is stable (componentKey ordering, sorted metadata/evidence). ## Benchmark - Scenario id: `node_detection_gaps_fixture` (config: `src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json`) - Fixture root: `samples/runtime/node-detection-gaps` - Run: - `dotnet run --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --config src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom` - Prometheus output includes additional metrics under `scanner_analyzer_bench_metric{scenario=\"...\",name=\"node.importScan.*\"}`.