up
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
This commit is contained in:
69
docs/modules/scanner/analyzers-python.md
Normal file
69
docs/modules/scanner/analyzers-python.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Python Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Python distributions without executing `python`/`pip` (static inspection only).
|
||||
- Prefers installed distribution metadata (`*.dist-info/`) and validates `RECORD` when present (bounded, streaming IO).
|
||||
- Emits deterministic component metadata (`pkg.kind`, `pkg.confidence`, `pkg.location`) and evidence locators for replay/audit.
|
||||
|
||||
## Inputs and precedence
|
||||
1. **Installed inventory (preferred)**: detect site-packages roots and parse `*.dist-info/` / `*.egg-info/` metadata for concrete `pkg:pypi/<name>@<version>` components.
|
||||
2. **Archive inventory**: mount wheels (`*.whl`) and zipapps (`*.pyz`, `*.pyzw`) into the Python VFS and enrich any in-archive `*.dist-info/` metadata (including `RECORD` verification).
|
||||
3. **Lock augmentation (current)**: parse root-level `requirements*.txt` pinned entries (`==`/`===`), `Pipfile.lock` `default` section, and `poetry.lock`; when a lock entry matches an installed component, merge lock metadata.
|
||||
4. **Declared-only (current)**: lock entries not present in installed inventory still emit components:
|
||||
- concrete versions emit a versioned `pkg:pypi/...@<version>` PURL
|
||||
- non-concrete declarations (e.g., editable paths) emit explicit-key components (see Identity Rules)
|
||||
|
||||
## Project discovery (including container roots)
|
||||
The analyzer is layout-aware and bounded:
|
||||
- Virtualenv layout roots are detected via `pyvenv.cfg` or `venv/`-style directories.
|
||||
- Site-packages roots include `lib/python*/site-packages` and `lib/python*/dist-packages`.
|
||||
- Container unpack layouts are supported as additional candidate roots:
|
||||
- `layers/*` (direct children)
|
||||
- `.layers/*` (direct children)
|
||||
- `layer*` (direct children of the analysis root)
|
||||
|
||||
## Virtual filesystem (VFS) and determinism
|
||||
- Inputs are normalized deterministically (dedupe + stable ordering); later/higher-confidence inputs override earlier ones in the VFS overlay.
|
||||
- Archive virtual roots are stable and collision-safe:
|
||||
- `archives/wheel/<file>`
|
||||
- `archives/zipapp/<file>`
|
||||
- `archives/sdist/<file>`
|
||||
- collisions use a deterministic `~N` suffix
|
||||
- Evidence locators are always analysis-root relative and use `/` separators.
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:pypi/<normalizedName>@<version>`
|
||||
|
||||
Non-concrete declarations emit an explicit key:
|
||||
- `componentKey = explicit::<analyzerId>::pypi::<name>::sha256:<digest>`
|
||||
- `purl = null`, `version = null`
|
||||
- generated via `LanguageExplicitKey.Create(...)` and aligned with `docs/modules/scanner/language-analyzers-contract.md`
|
||||
|
||||
Editable declarations (from requirements `--editable` / `-e`) normalize the specifier:
|
||||
- project-relative paths stay relative (`editable-src`)
|
||||
- absolute/host paths are redacted and never appear in the digest input
|
||||
|
||||
## Evidence and metadata
|
||||
Installed and archive distributions emit evidence for (when present):
|
||||
- `METADATA`, `RECORD`, `WHEEL`, `INSTALLER`, `entry_points.txt`, `direct_url.json`
|
||||
|
||||
`RECORD` verification emits deterministic counters:
|
||||
- `record.totalEntries`, `record.hashedEntries`, `record.missingFiles`, `record.hashMismatches`, `record.ioErrors`
|
||||
- plus `record.unsupportedAlgorithms` when algorithms outside the supported set are present
|
||||
|
||||
Declared-only/lock-only components include:
|
||||
- `declaredOnly=true`
|
||||
- `lockSource`, `lockLocator`, optional `lockResolved`, `lockIndex`, `lockExtras`, `lockEditablePath`
|
||||
|
||||
## Container overlay semantics (pending contract)
|
||||
When scanning raw OCI layer trees, correct overlay/whiteout handling is contract-driven. Until that contract lands, treat per-layer inventory as best-effort and do not rely on it as a merged-rootfs truth source.
|
||||
|
||||
## Vendored/bundled packages (pending contract)
|
||||
Vendored directory signals are detected but representation (separate components vs parent-only metadata) is contract-driven to avoid false vulnerability joins.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/PythonLanguageAnalyzer.cs`
|
||||
|
||||
Reference in New Issue
Block a user