Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
4.1 KiB
4.1 KiB
Python Analyzer (Scanner)
What it does
- Inventories Python distributions without executing
python/pip(static inspection only). - Prefers installed distribution metadata (
*.dist-info/) and validatesRECORDwhen present (bounded, streaming IO). - Emits deterministic component metadata (
pkg.kind,pkg.confidence,pkg.location) and evidence locators for replay/audit.
Inputs and precedence
- Installed inventory (preferred): detect site-packages roots and parse
*.dist-info//*.egg-info/metadata for concretepkg:pypi/<name>@<version>components. - Archive inventory: mount wheels (
*.whl) and zipapps (*.pyz,*.pyzw) into the Python VFS and enrich any in-archive*.dist-info/metadata (includingRECORDverification). - Lock augmentation (current): parse root-level
requirements*.txtpinned entries (==/===),Pipfile.lockdefaultsection, andpoetry.lock; when a lock entry matches an installed component, merge lock metadata. - Declared-only (current): lock entries not present in installed inventory still emit components:
- concrete versions emit a versioned
pkg:pypi/...@<version>PURL - non-concrete declarations (e.g., editable paths) emit explicit-key components (see Identity Rules)
- concrete versions emit a versioned
Project discovery (including container roots)
The analyzer is layout-aware and bounded:
- Virtualenv layout roots are detected via
pyvenv.cfgorvenv/-style directories. - Site-packages roots include
lib/python*/site-packagesandlib/python*/dist-packages. - Container unpack layouts are supported as additional candidate roots:
layers/*(direct children).layers/*(direct children)layer*(direct children of the analysis root)
Virtual filesystem (VFS) and determinism
- Inputs are normalized deterministically (dedupe + stable ordering); later/higher-confidence inputs override earlier ones in the VFS overlay.
- Archive virtual roots are stable and collision-safe:
archives/wheel/<file>archives/zipapp/<file>archives/sdist/<file>- collisions use a deterministic
~Nsuffix
- Evidence locators are always analysis-root relative and use
/separators.
Identity rules (PURL vs explicit key)
Concrete versions emit a PURL:
purl = pkg:pypi/<normalizedName>@<version>
Non-concrete declarations emit an explicit key:
componentKey = explicit::<analyzerId>::pypi::<name>::sha256:<digest>purl = null,version = null- generated via
LanguageExplicitKey.Create(...)and aligned withdocs/modules/scanner/language-analyzers-contract.md
Editable declarations (from requirements --editable / -e) normalize the specifier:
- project-relative paths stay relative (
editable-src) - absolute/host paths are redacted and never appear in the digest input
Evidence and metadata
Installed and archive distributions emit evidence for (when present):
METADATA,RECORD,WHEEL,INSTALLER,entry_points.txt,direct_url.json
RECORD verification emits deterministic counters:
record.totalEntries,record.hashedEntries,record.missingFiles,record.hashMismatches,record.ioErrors- plus
record.unsupportedAlgorithmswhen algorithms outside the supported set are present
Declared-only/lock-only components include:
declaredOnly=truelockSource,lockLocator, optionallockResolved,lockIndex,lockExtras,lockEditablePath
Container overlay semantics (pending contract)
When scanning raw OCI layer trees, correct overlay/whiteout handling is contract-driven. Until that contract lands, treat per-layer inventory as best-effort and do not rely on it as a merged-rootfs truth source.
Vendored/bundled packages (pending contract)
Vendored directory signals are detected but representation (separate components vs parent-only metadata) is contract-driven to avoid false vulnerability joins.
References
- Sprint:
docs/implplan/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md - Cross-analyzer contract:
docs/modules/scanner/language-analyzers-contract.md - Implementation:
src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python/PythonLanguageAnalyzer.cs