# Windows Analyzer Design Brief (Draft) > Owners: Scanner Guild, Policy Guild, Offline Kit Guild, Security Guild > Related backlog (proposed): SCANNER-ENG-0024..0027, DOCS-SCANNER-BENCH-62-002 > Status: Draft — contingent on Windows demand threshold (see `docs/benchmarks/scanner/windows-macos-demand.md`) ## 1. Objectives & boundaries - Provide deterministic inventory for Windows Server/container images covering MSI/WinSxS assemblies, Chocolatey packages, and registry-derived installers. - Preserve replayability (layer fragments, provenance metadata) and align outputs with existing SBOM/policy pipelines. - Respect sovereignty constraints: offline-friendly, signed rule bundles, no reliance on Windows APIs unavailable in containerized scans. Out of scope (Phase 1): - Live registry queries on running Windows hosts (requires runtime agent; defer to Zastava/Runtime roadmap). - Windows Update patch baseline comparison (tracked separately under Runtime/Posture). - UWP/MSIX packages (flagged for follow-up once MSI parity is complete). ## 2. Architecture overview ``` Scanner.Worker (Windows profile) ├─ Surface.Validation (enforce layer size, path allowlists) ├─ Surface.FS (materialized NTFS image via 7z/guestmount) ├─ MsiCollector -> LayerComponentFragment (windows-msi) ├─ WinSxSCollector -> LayerComponentFragment (windows-winsxs) ├─ ChocolateyCollector -> LayerComponentFragment (windows-choco) ├─ RegistryCollector -> Evidence overlays (uninstall/services) ├─ DriverCapabilityMapper -> Capability overlays (kernel/user drivers) └─ WindowsComponentMapper -> ComponentGraph + capability metadata ``` - Collectors operate on extracted filesystem snapshots; registry access performed on exported hive files produced during image extraction (document in ops runbooks). - `WindowsComponentMapper` normalizes component identities (ProductCode, AssemblyIdentity, Chocolatey package ID) and merges overlapping evidence into deterministic fragments. ## 3. Collectors ### 3.1 MSI collector - Input: `Windows/Installer/*.msi` database files (Jet OLE DB), registry hive exports for product mapping. - Implementation approach: - Use open-source MSI parser (custom or MIT-compatible) to avoid COM dependencies. - Extract Product, Component, File, Feature, Media tables. - Compute SHA256 for installed files via Component table, linking to WinSxS manifests. - Output metadata: `productCode`, `upgradeCode`, `productVersion`, `manufacturer`, `language`, `installContext`, `packageCode`, `sourceList`. - Evidence: file paths with digests, component IDs, CAB/patch references. ### 3.2 WinSxS collector - Input: `Windows/WinSxS/Manifests/*.manifest`, `Windows/WinSxS/` payload directories, catalog (.cat) files. - Parse XML assembly identities (name, version, processor architecture, public key token, language). - Map to MSI components when file hashes match. - Capture catalog signature thumbprint and optional patch KB references for policy gating. ### 3.3 Chocolatey collector - Input: `ProgramData/Chocolatey/lib/**`, `ProgramData/Chocolatey/package.backup`, `chocolateyinstall.ps1`, `.nuspec`. - Extract package ID, version, checksum, source feed, installed files and scripts. - Note whether install used cache or remote feed; record script hash for determinism. ### 3.4 Registry collector - Input: Exported `SOFTWARE` hive covering: - `Microsoft\Windows\CurrentVersion\Uninstall` - `Microsoft\Windows\CurrentVersion\Installer\UserData` - `Microsoft\Windows\CurrentVersion\Run` (startup apps) - Service/driver configuration from `SYSTEM` hive under `Services`. - Emit fallback evidence for installers not captured by MSI/Chocolatey (legacy EXE installers). - Record uninstall strings, install dates, publisher, estimated size, install location. ### 3.5 Driver & service mapper - Parse `SYSTEM` hive `Services` entries to detect drivers (type=1 or 2) and critical services (start mode auto/boot). - Output capability overlays (e.g., `windows.driver.kernelMode(true)`, `windows.service.autoStart("Spooler")`) for Policy Engine. ## 4. Component mapping & output - `WindowsComponentMapper`: - Generate `LayerComponentFragment`s with synthetic layer digests (e.g., `sha256:stellaops-windows-msi`). - Build `ComponentIdentity` with PURL-like scheme: `pkg:msi/` or `pkg:winsxs/`. - Include metadata: signature thumbprint, catalog hash, KB references, install context, manufacturer. - Capability overlays stored under `ScanAnalysisKeys.capability.windows` for policy consumption. - Export Center bundling: - Include MSI manifest extracts, WinSxS assembly manifests, Chocolatey nuspec snapshots, and service/driver capability CSV. ## 5. Policy integration - Predicates to introduce: - `windows.package.signed(expectedThumbprint?)` - `windows.package.unsupportedInstallerType` - `windows.driver.kernelMode`, `windows.driver.unsigned` - `windows.service.autoStart(name)` - `windows.choco.sourceAllowed(feed)` - Lattice approach: - Unsigned kernel drivers → default `fail`. - Unknown installer sources → `warn` with escalation on critical services. - Chocolatey packages from non-whitelisted feeds → configurable severity. - Waiver semantics bind to product code + signature thumbprint; waivers expire when package version changes. ## 6. Offline kit & distribution - Package: - MSI schema definitions and parser binaries (signed). - Chocolatey feed snapshot (nupkg archives + index) for allow-listed feeds. - Windows catalog certificate chains + optional CRL/OCSP caches. - Documentation: - Provide instructions for exporting registry hives during image extraction (PowerShell script included). - Note disk space expectations (Chocolatey snapshot size, WinSxS manifest volume). ## 7. Testing strategy - Fixtures: - Sample MSI packages (with/without transforms), WinSxS manifests, Chocolatey packages. - Registry hive exports representing mixed installer types. - Tests: - Unit tests for each collector parsing edge cases (language-specific manifests, transforms, script hashing). - Integration tests using synthetic Windows container image layers (generated via CI on Windows worker). - Determinism checks ensuring repeated runs produce identical fragments. - Security review: - Validate script execution paths (collectors must never execute Chocolatey scripts; inspect only). ## 8. Dependencies & open questions | Item | Description | Owner | Status | | --- | --- | --- | --- | | MSI parser choice | Select MIT/Apache-compatible parser or build internal reader | Scanner Guild | TBD | | Registry export tooling | Determine standard script/utility for hive exports in container context | Ops Guild | TBD | | Authenticodes verification locus | Decide scanner vs policy responsibility for signature verification | Security Guild | TBD | | Feed mirroring policy | Which Chocolatey feeds to mirror by default | Product + Security Guilds | TBD | ## 9. Proposed backlog entries | ID (proposed) | Title | Summary | | --- | --- | --- | | SCANNER-ENG-0024 | Implement Windows MSI collector | Parse MSI databases, emit component fragments with provenance metadata. | | SCANNER-ENG-0025 | Implement WinSxS manifest collector | Correlate assemblies with MSI components and catalog signatures. | | SCANNER-ENG-0026 | Implement Chocolatey & registry collectors | Harvest nuspec metadata and uninstall/service registry data. | | SCANNER-ENG-0027 | Policy & Offline integration for Windows | Define predicates, CLI toggles, Offline Kit packaging, documentation. | ## 10. References - `docs/benchmarks/scanner/deep-dives/windows.md` - `docs/benchmarks/scanner/windows-macos-demand.md` - `docs/modules/scanner/design/macos-analyzer.md` (structure/composition parallels) - Surface design docs (`surface-fs.md`, `surface-validation.md`, `surface-secrets.md`) for interfacing expectations. Further reading: `../../api/scanner/windows-coverage.md` (summary) and `../../api/scanner/windows-macos-summary.md` (metrics dashboard). Policy readiness alignment: see `../policy/windows-package-readiness.md` (POLICY-READINESS-0002). Upcoming milestone: FinSecure Corp PCI review requires Authenticode/feed decision by 2025-11-07 before Windows analyzer spike kickoff.