up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
This commit is contained in:
@@ -4,7 +4,7 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
|
||||
## Latest updates (2025-12-12)
|
||||
- Deterministic SBOM composition fixture published at `docs/modules/scanner/fixtures/deterministic-compose/` with DSSE, `_composition.json`, BOM, and hashes; doc `deterministic-sbom-compose.md` promoted to Ready v1.0 with offline verification steps.
|
||||
- Node analyzer now ingests npm/yarn/pnpm lockfiles, emitting `DeclaredOnly` components with lock provenance. The CLI companion command `stella node lock-validate` runs the collector offline, surfaces declared-only or missing-lock packages, and emits telemetry via `stellaops.cli.node.lock_validate.count`.
|
||||
- Node analyzer now ingests npm/yarn/pnpm lockfiles, emitting `DeclaredOnly` components with lock provenance. The CLI companion command `stella node lock-validate` runs the collector offline, surfaces declared-only or missing-lock packages, and emits telemetry via `stellaops.cli.node.lock_validate.count`. See `docs/modules/scanner/analyzers-node.md` and bench scenario `node_detection_gaps_fixture`.
|
||||
- Python analyzer picks up `requirements*.txt`, `Pipfile.lock`, and `poetry.lock`, tagging installed distributions with lock provenance and generating declared-only components for policy. Use `stella python lock-validate` to run the same checks locally before images are built.
|
||||
- Java analyzer now parses `gradle.lockfile`, `gradle/dependency-locks/**/*.lockfile`, and `pom.xml` dependencies via the new `JavaLockFileCollector`, merging lock metadata onto jar evidence and emitting declared-only components when jars are absent. The new CLI verb `stella java lock-validate` reuses that collector offline (table/JSON output) and records `stellaops.cli.java.lock_validate.count{outcome}` for observability.
|
||||
- Worker/WebService now resolve cache roots and feature flags via `StellaOps.Scanner.Surface.Env`; misconfiguration warnings are documented in `docs/modules/scanner/design/surface-env.md` and surfaced through startup validation.
|
||||
@@ -37,6 +37,7 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
- ./operations/analyzers-grafana-dashboard.json
|
||||
- ./operations/rustfs-migration.md
|
||||
- ./operations/entrypoint.md
|
||||
- ./analyzers-node.md
|
||||
- ./operations/secret-leak-detection.md
|
||||
- ./operations/dsse-rekor-operator-guide.md
|
||||
- ./os-analyzers-evidence.md
|
||||
|
||||
81
docs/modules/scanner/analyzers-bun.md
Normal file
81
docs/modules/scanner/analyzers-bun.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Bun Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories npm-ecosystem dependencies from Bun-managed projects without executing `bun`.
|
||||
- Supports installed inventory (`node_modules/**/package.json`), lockfile-only inventory (`bun.lock`), and declared-only fallback from `package.json`.
|
||||
- Enriches output with deterministic scope signals (`dev`, `optional`, `peer`, `scopeUnknown`), patch attribution, and bounded sha256 evidence.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed inventory** (`node_modules/` present): traverse installed packages and emit components from installed `package.json` (uses `bun.lock` for resolved/integrity + scope enrichment when present).
|
||||
2. **Lockfile-only** (`bun.lock` present, no install): parse `bun.lock` and emit components from lock entries.
|
||||
3. **Declared-only fallback** (project markers present but no `bun.lock`/install): emit explicit-key components from `package.json` dependency sections.
|
||||
4. **Unsupported** (`bun.lockb` only): emit a remediation record explaining how to produce `bun.lock`.
|
||||
|
||||
## Project discovery (including container roots)
|
||||
The analyzer discovers Bun project roots under:
|
||||
- The analysis root (`context.RootPath`)
|
||||
- Common OCI unpack layouts: `layers/*`, `.layers/*`, and `layer*` (direct children)
|
||||
|
||||
Discovery is bounded and deterministic:
|
||||
- Sorted directory enumeration
|
||||
- Explicit depth and root caps
|
||||
- Never recurses into `node_modules/`
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:npm/<name>@<version>`
|
||||
- Concrete versions follow the Node-style guardrail (no ranges/tags/paths embedded as a "version"; see `Internal/BunVersionSpec.IsConcreteNpmVersion`).
|
||||
|
||||
Non-concrete versions emit an explicit key:
|
||||
- `componentKey = explicit::<analyzerId>::npm::<name>::sha256:<digest>`
|
||||
- `purl = null`, `version = null`
|
||||
- Used for declared-only dependencies and any lock/installed records whose `version` is not concrete (e.g., `workspace:*`, `link:../...`, `file:../...`).
|
||||
|
||||
Explicit-key digest input (canonical, UTF-8):
|
||||
```
|
||||
npm\n<name>\n<spec>\n<originLocator>
|
||||
```
|
||||
Generated via `LanguageExplicitKey.Create(...)` and aligned with `docs/modules/scanner/language-analyzers-contract.md`.
|
||||
|
||||
## Evidence and locators
|
||||
All evidence locators are relative and use `/` separators.
|
||||
|
||||
### File evidence
|
||||
- Installed packages: `node_modules/.../package.json`
|
||||
- Hashing: sha256 is computed for `package.json` only when size is within 1 MiB; when skipped, metadata includes:
|
||||
- `packageJson.hashSkipped=true`
|
||||
- `packageJson.hashSkipReason=<missing|unauthorized|io|size>...`
|
||||
|
||||
### Lockfile entry evidence
|
||||
- Locator format: `<lockfileRelativePath>:packages[<name>@<version>]`
|
||||
- Example: `bun.lock:packages[lodash@4.17.21]`
|
||||
- Hashing: sha256 is computed for `bun.lock` only when size is within 50 MiB; when skipped, metadata includes:
|
||||
- `bunLock.hashSkipped=true`
|
||||
- `bunLock.hashSkipReason=<missing|unauthorized|io|size>...`
|
||||
|
||||
## Scope semantics (dev/optional/peer)
|
||||
Scope is derived deterministically from the `bun.lock` dependency graph rooted at `package.json` declarations:
|
||||
- `dev=true` only when dev reachability is provable.
|
||||
- `optional=true` and `peer=true` are preserved when present in lock data or derived from declared scopes.
|
||||
- If the graph cannot disambiguate (multiple candidates/specifier mismatch), the record is marked:
|
||||
- `scopeUnknown=true`
|
||||
- `dev=false` (do not guess)
|
||||
|
||||
`includeDev=false` filters only packages proven to be dev-only; unknown-scope packages are kept but marked `scopeUnknown=true`.
|
||||
|
||||
## Patches and workspaces
|
||||
- Workspace patterns come from root `package.json` (`workspaces`).
|
||||
- Patch attribution supports Bun's `patchedDependencies` and patch directories.
|
||||
- Patch keys preserve version specificity (`name@version`) and patch paths are emitted as deterministic project-relative paths.
|
||||
- Patch matching precedence: `name@version` first; then name-only only when unambiguous.
|
||||
|
||||
## Known limitations
|
||||
- `bun.lockb` (binary lockfile) is not parsed; a remediation record is emitted instead.
|
||||
- The analyzer does not execute `bun` and does not fetch registries; offline-only behavior is enforced.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Design notes: `docs/modules/scanner/prep/bun-analyzer-design.md`
|
||||
- Gotchas: `docs/modules/scanner/bun-analyzer-gotchas.md`
|
||||
|
||||
65
docs/modules/scanner/analyzers-java.md
Normal file
65
docs/modules/scanner/analyzers-java.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Java Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Maven coordinates from JVM archives (JAR/WAR/EAR/fat JAR) without executing build tools.
|
||||
- Prefers installed artifact metadata (`META-INF/maven/**/pom.properties`), with a `pom.xml` fallback when properties are missing.
|
||||
- Enriches output with bounded embedded-library scan metadata and JNI usage hints.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed archive inventory**: parse Maven coordinates from `META-INF/maven/**/pom.properties` in each discovered archive.
|
||||
2. **`pom.xml` fallback**: when no `pom.properties` in the archive, parse `META-INF/maven/**/pom.xml` and emit a Maven PURL only when `groupId`, `artifactId`, and `version` are concrete (no placeholders like `${...}`).
|
||||
3. **Lock augmentation (current)**: when a lock entry matches an installed artifact, merge lock metadata onto the component; unmatched lock entries still emit declared-only components.
|
||||
4. **Multi-module lock precedence (pending)**: deterministic precedence rules are tracked in `SCAN-JAVA-403-003` (blocked).
|
||||
5. **Runtime images (pending)**: runtime component identity is tracked in `SCAN-JAVA-403-004` (blocked).
|
||||
|
||||
## Embedded archives (fat JAR / WAR / EAR layouts)
|
||||
The analyzer scans embedded library jars without extracting them to disk:
|
||||
- `BOOT-INF/lib/*.jar`
|
||||
- `WEB-INF/lib/*.jar`
|
||||
- `APP-INF/lib/*.jar`
|
||||
- `lib/*.jar`
|
||||
|
||||
### Locator format
|
||||
Evidence locators are nested deterministically using `!` separators:
|
||||
- `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`
|
||||
|
||||
### Bounds and skip markers
|
||||
Embedded scanning is bounded and deterministic:
|
||||
- Max embedded jars per archive: `256`
|
||||
- Max embedded jar bytes: `25 MiB`
|
||||
|
||||
When embedded scanning is skipped or truncated, the outer component metadata includes deterministic markers:
|
||||
- `embeddedScan.candidateJars`, `embeddedScan.scannedJars`, `embeddedScan.emittedComponents`
|
||||
- `embeddedScanSkipped=true`, `embeddedScan.skippedJars`, `embeddedScanSkipReasons=<...>` (when applicable)
|
||||
|
||||
Embedded components include:
|
||||
- `embedded=true`
|
||||
- `embedded.containerJarPath=<outerRelativePath>`
|
||||
- `embedded.entryPath=<embeddedEntryPath>`
|
||||
|
||||
## Evidence and hashing
|
||||
- Evidence locators are project-relative, use `/` separators, and use `!` for nested artifact paths.
|
||||
- `sha256` for `pom.properties` and `pom.xml` evidence is computed over the raw entry bytes.
|
||||
|
||||
## `pom.xml` with incomplete coordinates
|
||||
When `pom.xml` is present but coordinates are incomplete (missing values or `${...}` placeholders), the analyzer emits an explicit-key component:
|
||||
- `purl=null`, `version=null`
|
||||
- `metadata.unresolvedCoordinates=true`
|
||||
- `componentKey` follows the cross-analyzer explicit-key scheme via `LanguageExplicitKey.Create("java", "maven", ...)`
|
||||
|
||||
## JNI metadata (bytecode-based)
|
||||
JNI hints are derived from parsed bytecode (native method flags and load call sites), not raw ASCII scanning.
|
||||
|
||||
When bytecode analysis finds JNI edges (`jni.edgeCount > 0`), components are annotated with bounded, deterministic metadata:
|
||||
- `jni.edgeCount`, `jni.nativeMethodCount`, `jni.loadCallCount`, optional `jni.warningCount`
|
||||
- `jni.reasons` (distinct reason codes)
|
||||
- `jni.targetLibraries` (top-N stable sample; currently 12)
|
||||
|
||||
## Known limitations
|
||||
- Shaded jars that strip Maven metadata remain best-effort; embedded libs without Maven metadata do not emit components.
|
||||
- Gradle multi-module lock precedence and runtime image component identity remain blocked until explicit decisions land.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/JavaLanguageAnalyzer.cs`
|
||||
79
docs/modules/scanner/analyzers-node.md
Normal file
79
docs/modules/scanner/analyzers-node.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Node Analyzer (npm/Yarn/pnpm)
|
||||
|
||||
This document captures the Node language analyzer’s deterministic behavior guarantees and safety constraints (what it emits, what it refuses to emit, and how it stays bounded/offline).
|
||||
|
||||
## Component identity & precedence
|
||||
|
||||
### Installed vs declared-only
|
||||
- The analyzer always emits **on-disk inventory** first (workspace member manifests + installed `node_modules`/PNPM/Yarn PnP cache packages).
|
||||
- It then emits **declared-only** components for lockfile / manifest declarations that are **not backed by on-disk inventory**:
|
||||
- If a declared entry has a **concrete resolved version** from a lockfile, it emits a versioned `pkg:npm/...@<version>` PURL.
|
||||
- If the version is **non-concrete** (ranges/tags/git/file/workspace/link/path), it emits an **explicit-key** component (`purl=null`, `version=null`).
|
||||
|
||||
### Identity safety (PURL vs explicit-key)
|
||||
- Concrete PURLs are emitted only when the analyzer can prove a **concrete version** from local evidence (installed `package.json` or a lockfile-resolved entry).
|
||||
- Declared-only/non-concrete dependencies use `LanguageExplicitKey` (see `docs/modules/scanner/language-analyzers-contract.md`).
|
||||
|
||||
### Lock metadata lookup precedence
|
||||
When attaching lock metadata to an installed package:
|
||||
1) `package-lock.json` path match (`packages["<relativePath>"]`),
|
||||
2) `(name, version)` match (Yarn/pnpm multi-version support),
|
||||
3) fallback to name-only (last-wins) for legacy locks.
|
||||
|
||||
## Lockfile parsing guarantees (offline)
|
||||
|
||||
### `package-lock.json` (npm)
|
||||
- Supports v3+ `packages{}` layout and legacy `dependencies{}` traversal.
|
||||
- Correctly extracts nested names from `node_modules/.../node_modules/...` paths (including scoped packages).
|
||||
|
||||
### `yarn.lock` (Yarn v1 + Berry v2/v3)
|
||||
- Supports both Yarn v1 (`resolved "https://..."`) and Berry fields (`resolution:`, `checksum:`).
|
||||
- If `integrity` is absent but `checksum` is present, the analyzer records integrity-like evidence as `checksum:<value>`.
|
||||
- Ignores the `__metadata` section.
|
||||
|
||||
### `pnpm-lock.yaml` (pnpm)
|
||||
- Parses modern `packages:` and `snapshots:` sections.
|
||||
- Does not drop entries that lack `integrity` (workspace/link/file/git); instead it emits:
|
||||
- `lockIntegrityMissing=true`
|
||||
- `lockIntegrityMissingReason=<workspace|link|file|git|directory|missing>`
|
||||
|
||||
## Workspaces
|
||||
- Reads workspace members from the root `package.json` (`workspaces` array or `{ packages: [...] }` form).
|
||||
- Supports glob patterns:
|
||||
- `*` (single segment)
|
||||
- `**` (multi-segment)
|
||||
- Expansion is bounded and deterministic:
|
||||
- Skips `node_modules`
|
||||
- Caps traversal depth and total visited directories/members
|
||||
- Stable, sorted member output
|
||||
- Dependency scopes (`production|development|peer|optional`) are derived from both the root and workspace manifests, with deterministic precedence.
|
||||
|
||||
## Import scanning (bounded)
|
||||
- Import scanning runs only for the root package and workspace member packages (not `node_modules` packages).
|
||||
- File types: `.js/.jsx/.mjs/.cjs/.ts/.tsx/.mts/.cts`.
|
||||
- Parser behavior:
|
||||
- Attempts AST parsing as script/module; falls back to a bounded regex heuristic for TS when parsing fails.
|
||||
- Hard caps per package:
|
||||
- `maxFiles=500`, `maxBytes=5MiB`, `maxFileBytes=512KiB`, `maxDepth=20`
|
||||
- Skips `node_modules` and `.pnpm` directories during traversal
|
||||
- If capped, the analyzer marks the package metadata with:
|
||||
- `importScanSkipped=true`
|
||||
- `importScan.filesScanned=<n>`
|
||||
- `importScan.bytesScanned=<n>`
|
||||
|
||||
## Container layer layouts
|
||||
- Candidate layer roots under the analysis root:
|
||||
- `layers/*`, `.layers/*`, `layer*`
|
||||
- Each candidate root is scanned independently.
|
||||
- The analyzer also discovers `package.json` roots nested under layer roots (bounded depth) and includes their nested `node_modules` roots when present.
|
||||
|
||||
## Determinism & evidence hashing
|
||||
- On-disk `package.json` manifests are hashed (sha256) when ≤ 1 MiB and attached to the root evidence for deterministic provenance.
|
||||
- Output ordering is stable (componentKey ordering, sorted metadata/evidence).
|
||||
|
||||
## Benchmark
|
||||
- Scenario id: `node_detection_gaps_fixture` (config: `src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json`)
|
||||
- Fixture root: `samples/runtime/node-detection-gaps`
|
||||
- Run:
|
||||
- `dotnet run --project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj -- --repo-root . --config src/Bench/StellaOps.Bench/Scanner.Analyzers/config.json --json out/bench/scanner-analyzers/latest.json --prom out/bench/scanner-analyzers/latest.prom`
|
||||
- Prometheus output includes additional metrics under `scanner_analyzer_bench_metric{scenario=\"...\",name=\"node.importScan.*\"}`.
|
||||
69
docs/modules/scanner/analyzers-python.md
Normal file
69
docs/modules/scanner/analyzers-python.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Python Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Python distributions without executing `python`/`pip` (static inspection only).
|
||||
- Prefers installed distribution metadata (`*.dist-info/`) and validates `RECORD` when present (bounded, streaming IO).
|
||||
- Emits deterministic component metadata (`pkg.kind`, `pkg.confidence`, `pkg.location`) and evidence locators for replay/audit.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed inventory (preferred)**: detect site-packages roots and parse `*.dist-info/` / `*.egg-info/` metadata for concrete `pkg:pypi/<name>@<version>` components.
|
||||
2. **Archive inventory**: mount wheels (`*.whl`) and zipapps (`*.pyz`, `*.pyzw`) into the Python VFS and enrich any in-archive `*.dist-info/` metadata (including `RECORD` verification).
|
||||
3. **Lock augmentation (current)**: parse root-level `requirements*.txt` pinned entries (`==`/`===`), `Pipfile.lock` `default` section, and `poetry.lock`; when a lock entry matches an installed component, merge lock metadata.
|
||||
4. **Declared-only (current)**: lock entries not present in installed inventory still emit components:
|
||||
- concrete versions emit a versioned `pkg:pypi/...@<version>` PURL
|
||||
- non-concrete declarations (e.g., editable paths) emit explicit-key components (see Identity Rules)
|
||||
|
||||
## Project discovery (including container roots)
|
||||
The analyzer is layout-aware and bounded:
|
||||
- Virtualenv layout roots are detected via `pyvenv.cfg` or `venv/`-style directories.
|
||||
- Site-packages roots include `lib/python*/site-packages` and `lib/python*/dist-packages`.
|
||||
- Container unpack layouts are supported as additional candidate roots:
|
||||
- `layers/*` (direct children)
|
||||
- `.layers/*` (direct children)
|
||||
- `layer*` (direct children of the analysis root)
|
||||
|
||||
## Virtual filesystem (VFS) and determinism
|
||||
- Inputs are normalized deterministically (dedupe + stable ordering); later/higher-confidence inputs override earlier ones in the VFS overlay.
|
||||
- Archive virtual roots are stable and collision-safe:
|
||||
- `archives/wheel/<file>`
|
||||
- `archives/zipapp/<file>`
|
||||
- `archives/sdist/<file>`
|
||||
- collisions use a deterministic `~N` suffix
|
||||
- Evidence locators are always analysis-root relative and use `/` separators.
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:pypi/<normalizedName>@<version>`
|
||||
|
||||
Non-concrete declarations emit an explicit key:
|
||||
- `componentKey = explicit::<analyzerId>::pypi::<name>::sha256:<digest>`
|
||||
- `purl = null`, `version = null`
|
||||
- generated via `LanguageExplicitKey.Create(...)` and aligned with `docs/modules/scanner/language-analyzers-contract.md`
|
||||
|
||||
Editable declarations (from requirements `--editable` / `-e`) normalize the specifier:
|
||||
- project-relative paths stay relative (`editable-src`)
|
||||
- absolute/host paths are redacted and never appear in the digest input
|
||||
|
||||
## Evidence and metadata
|
||||
Installed and archive distributions emit evidence for (when present):
|
||||
- `METADATA`, `RECORD`, `WHEEL`, `INSTALLER`, `entry_points.txt`, `direct_url.json`
|
||||
|
||||
`RECORD` verification emits deterministic counters:
|
||||
- `record.totalEntries`, `record.hashedEntries`, `record.missingFiles`, `record.hashMismatches`, `record.ioErrors`
|
||||
- plus `record.unsupportedAlgorithms` when algorithms outside the supported set are present
|
||||
|
||||
Declared-only/lock-only components include:
|
||||
- `declaredOnly=true`
|
||||
- `lockSource`, `lockLocator`, optional `lockResolved`, `lockIndex`, `lockExtras`, `lockEditablePath`
|
||||
|
||||
## Container overlay semantics (pending contract)
|
||||
When scanning raw OCI layer trees, correct overlay/whiteout handling is contract-driven. Until that contract lands, treat per-layer inventory as best-effort and do not rely on it as a merged-rootfs truth source.
|
||||
|
||||
## Vendored/bundled packages (pending contract)
|
||||
Vendored directory signals are detected but representation (separate components vs parent-only metadata) is contract-driven to avoid false vulnerability joins.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/PythonLanguageAnalyzer.cs`
|
||||
|
||||
@@ -42,9 +42,14 @@ src/
|
||||
└─ Tools/
|
||||
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
```
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
```
|
||||
|
||||
Per-analyzer notes (language analyzers):
|
||||
- `docs/modules/scanner/analyzers-java.md`
|
||||
- `docs/modules/scanner/analyzers-bun.md`
|
||||
- `docs/modules/scanner/analyzers-python.md`
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
|
||||
### 1.2 Native reachability upgrades (Nov 2026)
|
||||
|
||||
@@ -397,7 +402,9 @@ scanner:
|
||||
|
||||
---
|
||||
|
||||
## 12) Testing matrix
|
||||
## 12) Testing matrix
|
||||
|
||||
* **Analyzer contracts:** see `language-analyzers-contract.md` and per-analyzer docs (e.g., `analyzers-java.md`, Sprint 0403).
|
||||
|
||||
* **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
|
||||
* **OS packages:** ground‑truth images per distro; compare to package DB.
|
||||
|
||||
110
docs/modules/scanner/language-analyzers-contract.md
Normal file
110
docs/modules/scanner/language-analyzers-contract.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Scanner Language Analyzer Contracts (Identity / Evidence / Container Layout)
|
||||
|
||||
This document freezes the cross-analyzer contracts that are shared by the language analyzers (Java, .NET, Python, Node, Bun). These rules exist to prevent false matches, keep outputs deterministic, and protect against host-path leakage.
|
||||
|
||||
## 1) Identity Safety Contract (PURL vs Explicit Key)
|
||||
|
||||
### 1.1 Goals
|
||||
- **No fake versions**: never encode version ranges, tags, local paths, or git URLs as a versioned PURL.
|
||||
- **No collisions**: explicit-key identities must not collide with concrete PURLs and must be deterministic across OS path separators.
|
||||
- **Proof-first**: emit concrete PURLs only when the analyzer has concrete, replayable evidence for the version.
|
||||
|
||||
### 1.2 When to emit a concrete PURL
|
||||
Emit a concrete (versioned) PURL only when **both** are true:
|
||||
1) The analyzer can determine a **concrete version** (ecosystem-specific) for the component.
|
||||
2) The version is backed by **replayable evidence** (e.g., installed artifact metadata or lockfile-resolved entry).
|
||||
|
||||
Typical sources that qualify:
|
||||
- **Installed inventory** (e.g., `node_modules/**/package.json`, Python `*.dist-info/METADATA`, .NET `deps.json` entries).
|
||||
- **Lockfile-resolved inventory** (e.g., `bun.lock` entry with `name@version` and integrity/resolved URL).
|
||||
|
||||
### 1.3 When to emit an explicit-key component (required)
|
||||
Emit an explicit-key component when the dependency is **declared-only** or otherwise **non-concrete**:
|
||||
- Version ranges / operators (`^`, `~`, `>=`, `<`, `*`, `x`, `latest`, etc.).
|
||||
- Workspace/link/file dependencies (`workspace:*`, `link:`, `file:`, local path refs, editable installs).
|
||||
- Git dependencies (git URL / commit / ref) when a concrete semantic version is not provable from local evidence.
|
||||
- Unknown / missing version.
|
||||
|
||||
**Rule:** If the analyzer cannot prove a concrete version from local evidence, it must not emit a versioned PURL for that dependency.
|
||||
|
||||
### 1.4 Explicit-key format (canonical)
|
||||
For declared-only / non-concrete identities, analyzers must emit:
|
||||
- `componentKey`: `explicit::<analyzerId>::<ecosystem>::<name>::sha256:<digest>`
|
||||
- `purl`: `null`
|
||||
- `version`: `null`
|
||||
|
||||
Where `<digest>` is `sha256` of the canonical UTF-8 string:
|
||||
```
|
||||
<ecosystem>\n<normalizedName>\n<normalizedSpec>\n<originLocator>
|
||||
```
|
||||
|
||||
Canonicalization rules:
|
||||
- `<normalizedName>` uses ecosystem naming rules (e.g., npm scoped names keep `@scope/name`).
|
||||
- `<normalizedSpec>` is the **original declared specifier** (range/tag/url/path), trimmed; for unknown, use `""`.
|
||||
- `<originLocator>` is project-relative with `/` separators (e.g., `package.json#dependencies`, `requirements.txt`, `Directory.Packages.props#PackageVersion:Foo`).
|
||||
- No absolute paths, drive letters, or host roots appear in any input to the digest.
|
||||
|
||||
### 1.5 Required metadata for explicit-key components
|
||||
Explicit-key components must include (at minimum) these metadata keys:
|
||||
- `declaredOnly=true`
|
||||
- `declared.source=<file>` (e.g., `package.json`, `Directory.Packages.props`)
|
||||
- `declared.locator=<originLocator>` (same string used in digest)
|
||||
- `declared.versionSpec=<normalizedSpec>` (original specifier or empty)
|
||||
- `declared.scope=<prod|dev|peer|optional|unknown>` when applicable
|
||||
- `declared.sourceType=<range|tag|git|tarball|file|link|workspace|path|editable|unknown>`
|
||||
|
||||
## 2) Evidence Locator Contract
|
||||
|
||||
### 2.1 General rules
|
||||
- Evidence locators are **external-facing** and must be stable and parseable.
|
||||
- Every locator is **project-relative** with `/` separators (never absolute).
|
||||
- Evidence content/hashing must be bounded; when bounds are exceeded, emit deterministic `skipped` markers in metadata instead of silently omitting.
|
||||
|
||||
### 2.2 Locator formats (canonical)
|
||||
**File evidence**
|
||||
- `locator`: `<relativePath>` (e.g., `packages/app/package.json`)
|
||||
- `source`: a stable discriminator (e.g., `package.json`, `pom.xml`, `METADATA`)
|
||||
|
||||
**Lockfile entry evidence**
|
||||
- `locator`: `<lockfileRelativePath>:<selector>`
|
||||
- Examples:
|
||||
- Node package-lock: `package-lock.json:packages/app/node_modules/foo`
|
||||
- Bun lock: `bun.lock:packages[foo@1.2.3]`
|
||||
- Maven/Gradle lock: `gradle.lockfile:com.example:foo:1.2.3`
|
||||
|
||||
**Nested artifact evidence**
|
||||
- `locator`: `<outer>!<inner>!<path>`
|
||||
- Example: `demo-jni.jar!META-INF/native-image/demo/jni-config.json`
|
||||
|
||||
**Derived evidence**
|
||||
- `locator`: a stable synthetic name (e.g., `phase22.ndjson`)
|
||||
- `source`: a stable synthetic source (e.g., `node.observation`)
|
||||
|
||||
### 2.3 Hashing rules (baseline)
|
||||
- Hash only bounded inputs (default: 1 MiB per evidence value/file; analyzers may choose a tighter cap).
|
||||
- Hash algorithm: `sha256` over UTF-8 bytes for textual evidence, raw bytes for file evidence.
|
||||
- If hashing is skipped due to bounds or errors, emit deterministic metadata markers (e.g., `hashSkipped=true`, `hashSkipped.reason=sizeCap`).
|
||||
|
||||
## 3) Container Layout Discovery Contract
|
||||
|
||||
### 3.1 Layer root candidates
|
||||
Language analyzers that support container-root discovery must treat these as **candidate roots** under the analysis root:
|
||||
- `layers/*` (direct children)
|
||||
- `.layers/*` (direct children; **must not be skipped**)
|
||||
- `layer*` (direct children of the analysis root, e.g., `layer1/`, `layer2/`)
|
||||
|
||||
Each candidate root is scanned independently for projects.
|
||||
|
||||
### 3.2 Bounds and traversal safety (required)
|
||||
- Deterministic traversal (sorted directory enumeration).
|
||||
- Depth caps per candidate root; hard cap on total discovered project roots.
|
||||
- Must never recurse into `node_modules/` (Node/Bun) or equivalent heavy dirs.
|
||||
- Hidden directories may be skipped **except** `.layers` which is treated as a top-level candidate root.
|
||||
- No symlink escape: if symlinks are followed, resolved targets must remain within the candidate root prefix and cycles must be prevented.
|
||||
|
||||
### 3.3 Overlay/whiteout semantics
|
||||
- If an analyzer implements overlay semantics (notably Python container adapters), whiteouts and precedence rules must be explicit, deterministic, and fixture-tested.
|
||||
- If an analyzer does **not** implement overlay semantics, it must still keep discovery bounded and must not silently drop projects; emit deterministic "skipped" markers when bounds prevent full traversal.
|
||||
|
||||
## Compliance
|
||||
Sprints `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md` through `docs/implplan/SPRINT_0407_0001_0001_scanner_bun_detection_gaps.md` (and the program sprint `docs/implplan/SPRINT_0408_0001_0001_scanner_language_detection_gaps_program.md`) carry the per-analyzer implementation and test evidence required to enforce this contract.
|
||||
Reference in New Issue
Block a user