Files
git.stella-ops.org/docs/modules/scanner/analyzers-node.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

80 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Node Analyzer (npm/Yarn/pnpm)
This document captures the Node language analyzers deterministic behavior guarantees and safety constraints (what it emits, what it refuses to emit, and how it stays bounded/offline).
## Component identity & precedence
### Installed vs declared-only
- The analyzer always emits **on-disk inventory** first (workspace member manifests + installed `node_modules`/PNPM/Yarn PnP cache packages).
- It then emits **declared-only** components for lockfile / manifest declarations that are **not backed by on-disk inventory**:
- If a declared entry has a **concrete resolved version** from a lockfile, it emits a versioned `pkg:npm/...@<version>` PURL.
- If the version is **non-concrete** (ranges/tags/git/file/workspace/link/path), it emits an **explicit-key** component (`purl=null`, `version=null`).
### Identity safety (PURL vs explicit-key)
- Concrete PURLs are emitted only when the analyzer can prove a **concrete version** from local evidence (installed `package.json` or a lockfile-resolved entry).
- Declared-only/non-concrete dependencies use `LanguageExplicitKey` (see `docs/modules/scanner/language-analyzers-contract.md`).
### Lock metadata lookup precedence
When attaching lock metadata to an installed package:
1) `package-lock.json` path match (`packages["<relativePath>"]`),
2) `(name, version)` match (Yarn/pnpm multi-version support),
3) fallback to name-only (last-wins) for legacy locks.
## Lockfile parsing guarantees (offline)
### `package-lock.json` (npm)
- Supports v3+ `packages{}` layout and legacy `dependencies{}` traversal.
- Correctly extracts nested names from `node_modules/.../node_modules/...` paths (including scoped packages).
### `yarn.lock` (Yarn v1 + Berry v2/v3)
- Supports both Yarn v1 (`resolved "https://..."`) and Berry fields (`resolution:`, `checksum:`).
- If `integrity` is absent but `checksum` is present, the analyzer records integrity-like evidence as `checksum:<value>`.
- Ignores the `__metadata` section.
### `pnpm-lock.yaml` (pnpm)
- Parses modern `packages:` and `snapshots:` sections.
- Does not drop entries that lack `integrity` (workspace/link/file/git); instead it emits:
- `lockIntegrityMissing=true`
- `lockIntegrityMissingReason=<workspace|link|file|git|directory|missing>`
## Workspaces
- Reads workspace members from the root `package.json` (`workspaces` array or `{ packages: [...] }` form).
- Supports glob patterns:
- `*` (single segment)
- `**` (multi-segment)
- Expansion is bounded and deterministic:
- Skips `node_modules`
- Caps traversal depth and total visited directories/members
- Stable, sorted member output
- Dependency scopes (`production|development|peer|optional`) are derived from both the root and workspace manifests, with deterministic precedence.
## Import scanning (bounded)
- Import scanning runs only for the root package and workspace member packages (not `node_modules` packages).
- File types: `.js/.jsx/.mjs/.cjs/.ts/.tsx/.mts/.cts`.
- Parser behavior:
- Attempts AST parsing as script/module; falls back to a bounded regex heuristic for TS when parsing fails.
- Hard caps per package:
- `maxFiles=500`, `maxBytes=5MiB`, `maxFileBytes=512KiB`, `maxDepth=20`
- Skips `node_modules` and `.pnpm` directories during traversal
- If capped, the analyzer marks the package metadata with:
- `importScanSkipped=true`
- `importScan.filesScanned=<n>`
- `importScan.bytesScanned=<n>`
## Container layer layouts
- Candidate layer roots under the analysis root:
- `layers/*`, `.layers/*`, `layer*`
- Each candidate root is scanned independently.
- The analyzer also discovers `package.json` roots nested under layer roots (bounded depth) and includes their nested `node_modules` roots when present.
## Determinism & evidence hashing
- On-disk `package.json` manifests are hashed (sha256) when ≤ 1 MiB and attached to the root evidence for deterministic provenance.
- Output ordering is stable (componentKey ordering, sorted metadata/evidence).
## Benchmark
- Scenario id: `node_detection_gaps_fixture` (config: `src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json`)
- Fixture root: `samples/runtime/node-detection-gaps`
- Run:
- `dotnet run --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --config src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom`
- Prometheus output includes additional metrics under `scanner_analyzer_bench_metric{scenario=\"...\",name=\"node.importScan.*\"}`.