9.7 KiB
Windows Analyzer Design Brief (Draft)
Owners: Scanner Guild, Policy Guild, Offline Kit Guild, Security Guild
Related backlog (proposed): SCANNER-ENG-0024..0027, DOCS-SCANNER-BENCH-62-002
Status: Draft — contingent on Windows demand threshold (seedocs/benchmarks/scanner/windows-macos-demand.md)
1. Objectives & boundaries
- Provide deterministic inventory for Windows Server/container images covering MSI/WinSxS assemblies, Chocolatey packages, and registry-derived installers.
- Preserve replayability (layer fragments, provenance metadata) and align outputs with existing SBOM/policy pipelines.
- Respect sovereignty constraints: offline-friendly, signed rule bundles, no reliance on Windows APIs unavailable in containerized scans.
Out of scope (Phase 1):
- Live registry queries on running Windows hosts (requires runtime agent; defer to Zastava/Runtime roadmap).
- Windows Update patch baseline comparison (tracked separately under Runtime/Posture).
- UWP/MSIX packages (flagged for follow-up once MSI parity is complete).
2. Architecture overview
Scanner.Worker (Windows profile)
├─ Surface.Validation (enforce layer size, path allowlists)
├─ Surface.FS (materialized NTFS image via 7z/guestmount)
├─ MsiCollector -> LayerComponentFragment (windows-msi)
├─ WinSxSCollector -> LayerComponentFragment (windows-winsxs)
├─ ChocolateyCollector -> LayerComponentFragment (windows-choco)
├─ RegistryCollector -> Evidence overlays (uninstall/services)
├─ DriverCapabilityMapper -> Capability overlays (kernel/user drivers)
└─ WindowsComponentMapper -> ComponentGraph + capability metadata
- Collectors operate on extracted filesystem snapshots; registry access performed on exported hive files produced during image extraction (document in ops runbooks).
WindowsComponentMappernormalizes component identities (ProductCode, AssemblyIdentity, Chocolatey package ID) and merges overlapping evidence into deterministic fragments.
3. Collectors
3.1 MSI collector
- Input:
Windows/Installer/*.msidatabase files (Jet OLE DB), registry hive exports for product mapping. - Implementation approach:
- Use open-source MSI parser (custom or MIT-compatible) to avoid COM dependencies.
- Extract Product, Component, File, Feature, Media tables.
- Compute SHA256 for installed files via Component table, linking to WinSxS manifests.
- Output metadata:
productCode,upgradeCode,productVersion,manufacturer,language,installContext,packageCode,sourceList. - Evidence: file paths with digests, component IDs, CAB/patch references.
3.2 WinSxS collector
- Input:
Windows/WinSxS/Manifests/*.manifest,Windows/WinSxS/payload directories, catalog (.cat) files. - Parse XML assembly identities (name, version, processor architecture, public key token, language).
- Map to MSI components when file hashes match.
- Capture catalog signature thumbprint and optional patch KB references for policy gating.
3.3 Chocolatey collector
- Input:
ProgramData/Chocolatey/lib/**,ProgramData/Chocolatey/package.backup,chocolateyinstall.ps1,.nuspec. - Extract package ID, version, checksum, source feed, installed files and scripts.
- Note whether install used cache or remote feed; record script hash for determinism.
3.4 Registry collector
- Input: Exported
SOFTWAREhive covering:Microsoft\Windows\CurrentVersion\UninstallMicrosoft\Windows\CurrentVersion\Installer\UserDataMicrosoft\Windows\CurrentVersion\Run(startup apps)- Service/driver configuration from
SYSTEMhive underServices.
- Emit fallback evidence for installers not captured by MSI/Chocolatey (legacy EXE installers).
- Record uninstall strings, install dates, publisher, estimated size, install location.
3.5 Driver & service mapper
- Parse
SYSTEMhiveServicesentries to detect drivers (type=1 or 2) and critical services (start mode auto/boot). - Output capability overlays (e.g.,
windows.driver.kernelMode(true),windows.service.autoStart("Spooler")) for Policy Engine.
4. Component mapping & output
WindowsComponentMapper:- Generate
LayerComponentFragments with synthetic layer digests (e.g.,sha256:stellaops-windows-msi). - Build
ComponentIdentitywith PURL-like scheme:pkg:msi/<productCode>orpkg:winsxs/<assemblyIdentity>. - Include metadata: signature thumbprint, catalog hash, KB references, install context, manufacturer.
- Generate
- Capability overlays stored under
ScanAnalysisKeys.capability.windowsfor policy consumption. - Export Center bundling:
- Include MSI manifest extracts, WinSxS assembly manifests, Chocolatey nuspec snapshots, and service/driver capability CSV.
5. Policy integration
- Predicates to introduce:
windows.package.signed(expectedThumbprint?)windows.package.unsupportedInstallerTypewindows.driver.kernelMode,windows.driver.unsignedwindows.service.autoStart(name)windows.choco.sourceAllowed(feed)
- Lattice approach:
- Unsigned kernel drivers → default
fail. - Unknown installer sources →
warnwith escalation on critical services. - Chocolatey packages from non-whitelisted feeds → configurable severity.
- Unsigned kernel drivers → default
- Waiver semantics bind to product code + signature thumbprint; waivers expire when package version changes.
6. Offline kit & distribution
- Package:
- MSI schema definitions and parser binaries (signed).
- Chocolatey feed snapshot (nupkg archives + index) for allow-listed feeds.
- Windows catalog certificate chains + optional CRL/OCSP caches.
- Documentation:
- Provide instructions for exporting registry hives during image extraction (PowerShell script included).
- Note disk space expectations (Chocolatey snapshot size, WinSxS manifest volume).
7. Testing strategy
- Fixtures:
- Sample MSI packages (with/without transforms), WinSxS manifests, Chocolatey packages.
- Registry hive exports representing mixed installer types.
- Tests:
- Unit tests for each collector parsing edge cases (language-specific manifests, transforms, script hashing).
- Integration tests using synthetic Windows container image layers (generated via CI on Windows worker).
- Determinism checks ensuring repeated runs produce identical fragments.
- Security review:
- Validate script execution paths (collectors must never execute Chocolatey scripts; inspect only).
8. Dependencies & open questions
| Item | Description | Owner | Status |
|---|---|---|---|
| MSI parser choice | Select MIT/Apache-compatible parser or build internal reader | Scanner Guild | TBD |
| Registry export tooling | Determine standard script/utility for hive exports in container context | Ops Guild | TBD |
| Authenticodes verification locus | Decide scanner vs policy responsibility for signature verification | Security Guild | TBD |
| Feed mirroring policy | Which Chocolatey feeds to mirror by default | Product + Security Guilds | TBD |
9. Implementation status
| ID | Title | Status | Notes |
|---|---|---|---|
| SCANNER-ENG-0024 | Windows MSI collector | DONE | StellaOps.Scanner.Analyzers.OS.Windows.Msi - OLE compound document parser, extracts Product/File tables, 22 tests passing |
| SCANNER-ENG-0025 | WinSxS manifest collector | DONE | StellaOps.Scanner.Analyzers.OS.Windows.WinSxS - XML manifest parser, assembly identity extraction, 18 tests passing |
| SCANNER-ENG-0026 | Chocolatey collector | DONE | StellaOps.Scanner.Analyzers.OS.Windows.Chocolatey - nuspec parser with directory fallback, 44 tests passing |
| SCANNER-ENG-0026 | Registry collector | DEFERRED | Requires exported hive parsing; tracked separately |
| SCANNER-ENG-0027 | Policy predicates | PENDING | Requires Policy module integration (see §5) |
| SCANNER-ENG-0027 | Offline kit packaging | DONE | All analyzers work offline (local file parsing only) |
Implementation details
MSI collector (windows-msi analyzer ID):
- Parses MSI database files using OLE compound document signature detection
- Extracts ProductCode, UpgradeCode, ProductName, Manufacturer, ProductVersion
- PURL format:
pkg:generic/windows-msi/{normalized-name}@{version}?upgrade_code={code} - Vendor metadata:
msi:product_code,msi:upgrade_code,msi:manufacturer, etc.
WinSxS collector (windows-winsxs analyzer ID):
- Scans
Windows/WinSxS/Manifests/*.manifestfiles - Parses XML assembly identity with multiple namespace support (2006/2009/2016)
- Extracts name, version, architecture, public key token, language, type
- PURL format:
pkg:generic/windows-winsxs/{assembly-name}@{version}?arch={arch} - Vendor metadata:
winsxs:name,winsxs:version,winsxs:public_key_token, etc.
Chocolatey collector (windows-chocolatey analyzer ID):
- Scans
ProgramData/Chocolatey/lib/andProgramData/chocolatey/lib/ - Parses
.nuspecfiles with multiple schema namespace support (2010/2011/2015) - Falls back to directory name parsing when nuspec missing
- Computes SHA256 hash of
chocolateyinstall.ps1for determinism - PURL format:
pkg:chocolatey/{package-id}@{version} - Vendor metadata:
choco:id,choco:authors,choco:install_script_hash, etc.
10. References
docs/benchmarks/scanner/deep-dives/windows.mddocs/benchmarks/scanner/windows-macos-demand.mddocs/modules/scanner/design/macos-analyzer.md(structure/composition parallels)- Surface design docs (
surface-fs.md,surface-validation.md,surface-secrets.md) for interfacing expectations.
Further reading: ../../api/scanner/windows-coverage.md (summary) and ../../api/scanner/windows-macos-summary.md (metrics dashboard).
Policy readiness alignment: see ../policy/windows-package-readiness.md (POLICY-READINESS-0002).
Upcoming milestone: FinSecure Corp PCI review requires Authenticode/feed decision by 2025-11-07 before Windows analyzer spike kickoff.