Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
80 lines
4.5 KiB
Markdown
80 lines
4.5 KiB
Markdown
# Node Analyzer (npm/Yarn/pnpm)
|
||
|
||
This document captures the Node language analyzer’s deterministic behavior guarantees and safety constraints (what it emits, what it refuses to emit, and how it stays bounded/offline).
|
||
|
||
## Component identity & precedence
|
||
|
||
### Installed vs declared-only
|
||
- The analyzer always emits **on-disk inventory** first (workspace member manifests + installed `node_modules`/PNPM/Yarn PnP cache packages).
|
||
- It then emits **declared-only** components for lockfile / manifest declarations that are **not backed by on-disk inventory**:
|
||
- If a declared entry has a **concrete resolved version** from a lockfile, it emits a versioned `pkg:npm/...@<version>` PURL.
|
||
- If the version is **non-concrete** (ranges/tags/git/file/workspace/link/path), it emits an **explicit-key** component (`purl=null`, `version=null`).
|
||
|
||
### Identity safety (PURL vs explicit-key)
|
||
- Concrete PURLs are emitted only when the analyzer can prove a **concrete version** from local evidence (installed `package.json` or a lockfile-resolved entry).
|
||
- Declared-only/non-concrete dependencies use `LanguageExplicitKey` (see `docs/modules/scanner/language-analyzers-contract.md`).
|
||
|
||
### Lock metadata lookup precedence
|
||
When attaching lock metadata to an installed package:
|
||
1) `package-lock.json` path match (`packages["<relativePath>"]`),
|
||
2) `(name, version)` match (Yarn/pnpm multi-version support),
|
||
3) fallback to name-only (last-wins) for legacy locks.
|
||
|
||
## Lockfile parsing guarantees (offline)
|
||
|
||
### `package-lock.json` (npm)
|
||
- Supports v3+ `packages{}` layout and legacy `dependencies{}` traversal.
|
||
- Correctly extracts nested names from `node_modules/.../node_modules/...` paths (including scoped packages).
|
||
|
||
### `yarn.lock` (Yarn v1 + Berry v2/v3)
|
||
- Supports both Yarn v1 (`resolved "https://..."`) and Berry fields (`resolution:`, `checksum:`).
|
||
- If `integrity` is absent but `checksum` is present, the analyzer records integrity-like evidence as `checksum:<value>`.
|
||
- Ignores the `__metadata` section.
|
||
|
||
### `pnpm-lock.yaml` (pnpm)
|
||
- Parses modern `packages:` and `snapshots:` sections.
|
||
- Does not drop entries that lack `integrity` (workspace/link/file/git); instead it emits:
|
||
- `lockIntegrityMissing=true`
|
||
- `lockIntegrityMissingReason=<workspace|link|file|git|directory|missing>`
|
||
|
||
## Workspaces
|
||
- Reads workspace members from the root `package.json` (`workspaces` array or `{ packages: [...] }` form).
|
||
- Supports glob patterns:
|
||
- `*` (single segment)
|
||
- `**` (multi-segment)
|
||
- Expansion is bounded and deterministic:
|
||
- Skips `node_modules`
|
||
- Caps traversal depth and total visited directories/members
|
||
- Stable, sorted member output
|
||
- Dependency scopes (`production|development|peer|optional`) are derived from both the root and workspace manifests, with deterministic precedence.
|
||
|
||
## Import scanning (bounded)
|
||
- Import scanning runs only for the root package and workspace member packages (not `node_modules` packages).
|
||
- File types: `.js/.jsx/.mjs/.cjs/.ts/.tsx/.mts/.cts`.
|
||
- Parser behavior:
|
||
- Attempts AST parsing as script/module; falls back to a bounded regex heuristic for TS when parsing fails.
|
||
- Hard caps per package:
|
||
- `maxFiles=500`, `maxBytes=5MiB`, `maxFileBytes=512KiB`, `maxDepth=20`
|
||
- Skips `node_modules` and `.pnpm` directories during traversal
|
||
- If capped, the analyzer marks the package metadata with:
|
||
- `importScanSkipped=true`
|
||
- `importScan.filesScanned=<n>`
|
||
- `importScan.bytesScanned=<n>`
|
||
|
||
## Container layer layouts
|
||
- Candidate layer roots under the analysis root:
|
||
- `layers/*`, `.layers/*`, `layer*`
|
||
- Each candidate root is scanned independently.
|
||
- The analyzer also discovers `package.json` roots nested under layer roots (bounded depth) and includes their nested `node_modules` roots when present.
|
||
|
||
## Determinism & evidence hashing
|
||
- On-disk `package.json` manifests are hashed (sha256) when ≤ 1 MiB and attached to the root evidence for deterministic provenance.
|
||
- Output ordering is stable (componentKey ordering, sorted metadata/evidence).
|
||
|
||
## Benchmark
|
||
- Scenario id: `node_detection_gaps_fixture` (config: `src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json`)
|
||
- Fixture root: `samples/runtime/node-detection-gaps`
|
||
- Run:
|
||
- `dotnet run --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --config src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom`
|
||
- Prometheus output includes additional metrics under `scanner_analyzer_bench_metric{scenario=\"...\",name=\"node.importScan.*\"}`.
|