feat(python-analyzer): Enhance deterministic output tests and add new fixtures

- Updated TASKS.md to reflect changes in test fixtures for SCAN-PY-405-007. - Added multiple test cases to ensure deterministic output for various Python package scenarios, including conda environments, requirements files, and vendored directories. - Created new expected output files for conda packages (numpy, requests) and updated existing test fixtures for container whiteouts, wheel workspaces, and zipapp embedded requirements. - Introduced helper methods to create wheel and zipapp packages for testing purposes. - Added metadata files for new test fixtures to validate package detection and dependencies.
2025-12-21 17:51:19 +02:00
parent 22d67f203f
commit 292a6e94e8
29 changed files with 1043 additions and 25 deletions
--- a/docs/implplan/archived/SPRINT_0170_0001_0001_notifications_telemetry.md
+++ b/docs/implplan/archived/SPRINT_0170_0001_0001_notifications_telemetry.md
@@ -96,10 +96,10 @@
 ## Interlocks (External Dependencies)
 | Dependency | Source sprint / doc | Current state | Impact on waves |
 | --- | --- | --- | --- |
-| Sprint 150.A – Orchestrator (wave table) | `SPRINT_150_scheduling_automation.md` | TODO | Blocks visibility of job events for Notify templates and Telemetry samples until orchestration telemetry lands. |
-| ORCH-OBS-50-001 `orchestrator instrumentation` | Sprint 150 backlog | TODO | Needed for Telemetry.Core sample + Notify SLO hooks; monitor for slip. |
-| POLICY-OBS-50-001 `policy instrumentation` | Sprint 150 backlog | TODO | Required before Telemetry helpers can be adopted by Policy + risk routing. |
-| WEB-OBS-50-001 `gateway telemetry core adoption` | Sprint 214/215 backlogs | TODO | Ensures web/gateway emits trace IDs that Notify incident payload references. |
+| Sprint 150.A – Orchestrator (wave table) | `docs/implplan/archived/SPRINT_0150_0001_0001_scheduling_automation.md` | DONE (2025-12-10) | Unblocked: orchestration baseline landed; job/telemetry events available for Notify templates and Telemetry samples. |
+| ORCH-OBS-50-001 `orchestrator instrumentation` | `docs/implplan/archived/SPRINT_0151_0001_0001_orchestrator_i.md` | DONE (2025-12-10) | Telemetry.Core wiring complete; Notify SLO hooks and Telemetry.Core sample integration unblocked. |
+| POLICY-OBS-50-001 `policy instrumentation` | `docs/implplan/archived/SPRINT_0127_0001_0001_policy_reasoning.md` | DONE (2025-11-27) | Telemetry helpers available for Policy + risk routing adoption. |
+| WEB-OBS-50-001 `gateway telemetry core adoption` | `docs/implplan/archived/SPRINT_0214_0001_0001_web_iii.md` | DONE (2025-12-11) | Gateway emits trace IDs; Notify incident payloads can reference end-to-end trace context. |
 | POLICY-RISK-40-002 `risk profile metadata export` | Sprint 215+ (Policy) | DONE (2025-12-04) | Provides metadata enrichment for NOTIFY-RISK routes; unblocked. |

 ## Upcoming Checkpoints (historical)
@@ -149,3 +149,4 @@
 | 2025-12-05 | Merged legacy sprint content into canonical template, refreshed statuses to DONE, and reconfirmed external dependency states; legacy file stubbed to point here. | Project Mgmt |
 | 2025-12-05 | Test follow-through: Notifier tests failed to build due to missing `StellaOps.Notify.Storage.Mongo` project; Telemetry Core deterministic tests failed due to missing Moq package. Actions added to tracker (#2, #3); statuses remain DONE pending evidence. | Implementer |
 | 2025-12-06 | Telemetry Core tests verified GREEN; Moq restored from curated feed; evidence path recorded. Action tracker #3/#4 closed. | Telemetry Core Guild |
+| 2025-12-21 | Refreshed Interlocks (External Dependencies) table with upstream sprint outcomes; removed stale TODO rows (Orchestrator/Policy/Web telemetry adoption now DONE). | Implementer |
--- a/docs/implplan/archived/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md
+++ b/docs/implplan/archived/SPRINT_0405_0001_0001_scanner_python_detection_gaps.md
@@ -26,7 +26,7 @@
 | 4 | SCAN-PY-405-004 | DONE | Whiteout/overlay semantics implemented in `ContainerOverlayHandler` + `ContainerLayerAdapter`. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Correct container-layer inventory semantics**: when scanning raw OCI layer trees (`layers/`, `.layers/`, `layer*/`), honor whiteouts/overlay ordering so removed packages are not reported. Use/extend `Internal/Packaging/Adapters/ContainerLayerAdapter` semantics as the source of truth for precedence. Emit explicit metadata markers when inventory is partial due to missing overlay context (e.g., `container.overlayIncomplete=true`). |
 | 5 | SCAN-PY-405-005 | DONE | VendoredPackageDetector integrated; `VendoringMetadataBuilder` added. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Surface vendored (bundled) Python deps**: integrate `VendoredPackageDetector` so known vendoring patterns (`*_vendor`, `third_party`, `requests.packages`, etc.) are detected. Emit either (a) separate "embedded" components with bounded evidence locators (preferred) or (b) a bounded metadata summary on the parent package (`vendored.detected=true`, `vendored.packages`, `vendored.paths`). Never emit unbounded file/module lists; cap to top-N deterministic samples. |
 | 6 | SCAN-PY-405-006 | DONE | Scope classification added from lock entries (Scope enum) per Interlock 4. | Python Analyzer Guild (`src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Python`) | **Improve "used by entrypoint" and scope classification**: today `usedByEntrypoint` primarily comes from RECORD/script hints. Extend this by optionally mapping source-tree imports (`PythonImportAnalysis`) and/or runtime evidence (`PythonRuntimeEvidenceCollector`) to packages (via `TopLevelModules`) so "likely used" can be signaled deterministically (bounded, opt-in). Add `scope` metadata using `PythonScopeClassifier` (prod/dev/docs/build) based on lock sections and requirements file names. |
-| 7 | SCAN-PY-405-007 | TODO | Core implementation complete; fixtures pending. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests`) | **Fixtures + golden outputs**: add fixtures proving new detection paths: (a) conda env (`conda-meta/*.json`) without dist-info, (b) requirements with `-r` includes + `-e .` editable, (c) Pipfile.lock with `default` + `develop`, (d) wheel file in workspace (no extraction), (e) zipapp/pyz with embedded requirements, (f) container layers with whiteouts hiding a dist-info dir, (g) vendored dependency directory under a package. Extend `PythonLanguageAnalyzerTests.cs` to assert deterministic ordering, stable identities, and bounded metadata. |
+| 7 | SCAN-PY-405-007 | DONE | Fixtures + goldens landed; tests pass. | QA Guild (`src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests`) | **Fixtures + golden outputs**: add fixtures proving new detection paths: (a) conda env (`conda-meta/*.json`) without dist-info, (b) requirements with `-r` includes + `-e .` editable, (c) Pipfile.lock with `default` + `develop`, (d) wheel file in workspace (no extraction), (e) zipapp/pyz with embedded requirements, (f) container layers with whiteouts hiding a dist-info dir, (g) vendored dependency directory under a package. Extend `PythonLanguageAnalyzerTests.cs` to assert deterministic ordering, stable identities, and bounded metadata. |
 | 8 | SCAN-PY-405-008 | DONE | After core behavior lands, update docs + perf guard. | Docs Guild + Bench Guild (`docs/modules/scanner`, `src/Bench/StellaOps.Bench/Scanner.Analyzers`) | **Document + benchmark Python analyzer contract**: update `docs/modules/scanner/architecture.md` (or add a Python analyzer sub-doc) describing detection sources & precedence, lock parsing rules, container overlay semantics, vendoring representation, and identity rules for non-versioned components. Add a deterministic offline bench scanning a representative fixture (many packages + lockfiles) and record baseline ceilings (time + components count). |

 ## Wave Coordination
@@ -279,4 +279,6 @@ When import/runtime analysis contributes to usage signals:
 | 2025-12-13 | **Decided Actions 1-4 and Interlock 4** to unblock SCAN-PY-405-002 through SCAN-PY-405-007. Action 1: explicit-key identity scheme using `LanguageExplicitKey.Create`. Action 2: lock precedence order (poetry.lock > Pipfile.lock > pdm.lock > uv.lock > requirements.txt) with first-wins dedupe. Action 3: OCI whiteout semantics with deterministic layer ordering. Action 4: vendored deps emit parent metadata by default, separate components only with High confidence + known version. Interlock 4: usage/scope classification is opt-in, RECORD/entry_points signals remain default. | Implementer |
 | 2025-12-13 | Started implementation of SCAN-PY-405-002 through SCAN-PY-405-007 in parallel (all waves now unblocked). | Implementer |
 | 2025-12-13 | **Completed SCAN-PY-405-002 through SCAN-PY-405-006**: (1) `PythonLockFileCollector` upgraded with full precedence order, `-r` includes with cycle detection, PEP 508 parsing, `name @ url` direct refs, Pipenv develop section, pdm.lock/uv.lock support. (2) `ContainerOverlayHandler` + `ContainerLayerAdapter` updated with OCI whiteout semantics. (3) `VendoringMetadataBuilder` added for bounded parent metadata. (4) Scope/SourceType metadata added to analyzer. Build passes. SCAN-PY-405-007 (fixtures) remains TODO. | Implementer |
+| 2025-12-21 | Started SCAN-PY-405-007 (add deterministic fixtures + update goldens). | Implementer |
+| 2025-12-21 | Completed SCAN-PY-405-007: fixtures for conda env, requirements includes+editable, Pipfile.lock default+develop, wheel workspace, zipapp embedded requirements, container whiteouts, and vendored directories; updated goldens; verified `dotnet test src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests/StellaOps.Scanner.Analyzers.Lang.Python.Tests.csproj -c Release`. | Implementer |

--- a/docs/implplan/archived/SPRINT_0406_0001_0001_scanner_node_detection_gaps.md
+++ b/docs/implplan/archived/SPRINT_0406_0001_0001_scanner_node_detection_gaps.md
@@ -35,11 +35,11 @@
 ## Wave Coordination
 | Wave | Guild owners | Shared prerequisites | Status | Notes |
 | --- | --- | --- | --- | --- |
-| A: Declared-only & identity | Node Analyzer Guild + QA Guild | Action 1 | TODO | Emit declared-only safely; avoid invalid PURLs. |
-| B: Lock fidelity | Node Analyzer Guild + QA Guild | None | TODO | Multi-version lock correctness + Yarn Berry + pnpm hardening + nested path fixes. |
-| C: Workspaces & containers | Node Analyzer Guild + QA Guild | Action 2 | TODO | Workspace glob support + scope attribution + container app-root discovery. |
-| D: Imports & evidence | Node Analyzer Guild + QA Guild | Action 4 | TODO | ESM/TS import correctness + bounded scanning + package.json hashing. |
-| E: Docs & bench | Docs Guild + Bench Guild | Waves A–D | TODO | Contract + performance ceiling. |
+| A: Declared-only & identity | Node Analyzer Guild + QA Guild | Action 1 | DONE | Emit declared-only safely; avoid invalid PURLs. |
+| B: Lock fidelity | Node Analyzer Guild + QA Guild | None | DONE | Multi-version lock correctness + Yarn Berry + pnpm hardening + nested path fixes. |
+| C: Workspaces & containers | Node Analyzer Guild + QA Guild | Action 2 | DONE | Workspace glob support + scope attribution + container app-root discovery. |
+| D: Imports & evidence | Node Analyzer Guild + QA Guild | Action 4 | DONE | ESM/TS import correctness + bounded scanning + package.json hashing. |
+| E: Docs & bench | Docs Guild + Bench Guild | Waves A–D | DONE | Contract + performance ceiling. |

 ## Wave Detail Snapshots
 - **Wave A:** Declared-only dependencies become visible and safely keyed (no range-as-version PURLs).
@@ -70,7 +70,7 @@
 | 4 | Decide import-scanning policy: default enabled/disabled, scope (workspace only vs all packages), and caps to enforce. | Project Mgmt + Node Analyzer Guild | 2025-12-13 | Done | Scope: root + workspace members only; caps + skip markers; bench exports `node.importScan.*` metrics (see `docs/modules/scanner/analyzers-node.md`). |

 ## Decisions & Risks
- **Decision (pending):** Declared-only identity scheme, workspace glob bounds, lock precedence, and import scanning caps (Action Tracker 1–4).
+- **DECIDED (2025-12-13):** Declared-only identity scheme, workspace glob bounds, lock precedence, and import scanning caps (Action Tracker 1–4).

 | Risk ID | Risk | Impact | Likelihood | Mitigation | Owner | Trigger / Signal |
 | --- | --- | --- | --- | --- | --- | --- |
@@ -92,4 +92,5 @@
 | 2025-12-13 | Updated declared-only emission to use the cross-analyzer explicit-key format and expanded fixtures for `layers/`, `.layers/`, and `layer*/` discovery. | Implementer |
 | 2025-12-13 | Completed task 406-010 (fixtures + goldens: lock-only package-lock/yarn-berry/pnpm, workspace globs, container app-root discovery) with regression tests. | Implementer |
 | 2025-12-13 | Completed task 406-011 (docs + offline bench: `docs/modules/scanner/analyzers-node.md`, scenario `node_detection_gaps_fixture`, import-scan metrics) with bench/test coverage. | Implementer |
+| 2025-12-21 | Normalised Wave Coordination statuses to `DONE` (they were left `TODO`); verified `dotnet test src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Lang.Node.Tests/StellaOps.Scanner.Analyzers.Lang.Node.Tests.csproj -c Release` (365/365). | Implementer |

--- a/docs/implplan/archived/SPRINT_0411_0001_0001_semantic_entrypoint_engine.md
+++ b/docs/implplan/archived/SPRINT_0411_0001_0001_semantic_entrypoint_engine.md
@@ -75,10 +75,10 @@
 ## Action Tracker
 | # | Action | Owner | Due (UTC) | Status | Notes |
 |---|--------|-------|-----------|--------|-------|
-| 1 | Review existing entrypoint detection code | Scanner Guild | 2025-12-16 | TODO | Understand integration points |
-| 2 | Draft ApplicationIntent enum with cross-team input | Scanner Guild | 2025-12-17 | TODO | Need input from all language teams |
-| 3 | Create AGENTS.md for EntryTrace module | Scanner Guild | 2025-12-16 | TODO | Implementer guidance |
-| 4 | Validate semantic schema against richgraph-v1 | Platform Guild | 2025-12-18 | TODO | Ensure compatibility |
+| 1 | Review existing entrypoint detection code | Scanner Guild | 2025-12-16 | DONE | Covered by Delivery Tracker + sprint close notes. |
+| 2 | Draft ApplicationIntent enum with cross-team input | Scanner Guild | 2025-12-17 | DONE | Covered by Delivery Tracker + sprint close notes. |
+| 3 | Create AGENTS.md for EntryTrace module | Scanner Guild | 2025-12-16 | DONE | Covered by Delivery Tracker + sprint close notes. |
+| 4 | Validate semantic schema against richgraph-v1 | Platform Guild | 2025-12-18 | DONE | Covered by Delivery Tracker + sprint close notes. |

 ## Decisions & Risks

@@ -162,3 +162,4 @@ public enum CapabilityClass : long
 |------------|--------|-------|
 | 2025-12-13 | Created sprint from program sprint 0410; defined 25 tasks across schema, adapters, integration, QA/docs; included schema previews. | Planning |
 | 2025-12-13 | Completed tasks 17-25: DI registration (AddSemanticEntryTraceAnalyzer), LanguageComponentRecord semantic fields (intent, capabilities, threatVectors), verified richgraph-v1 semantic extensions and SBOM property extensions already implemented, verified test fixtures exist, created semantic-entrypoint-schema.md documentation, updated architecture.md with semantic engine section, verified CLI --semantic flag implementation. Sprint 100% complete. | Scanner Guild |
+| 2025-12-21 | Normalised Action Tracker statuses to `DONE` (they were left `TODO`); no semantic changes. | Implementer |