Files
git.stella-ops.org/docs/modules/scanner/design/windows-analyzer.md
StellaOps Bot d040c001ac
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
up
2025-11-28 19:23:54 +02:00

162 lines
9.7 KiB
Markdown

# Windows Analyzer Design Brief (Draft)
> Owners: Scanner Guild, Policy Guild, Offline Kit Guild, Security Guild
> Related backlog (proposed): SCANNER-ENG-0024..0027, DOCS-SCANNER-BENCH-62-002
> Status: Draft — contingent on Windows demand threshold (see `docs/benchmarks/scanner/windows-macos-demand.md`)
## 1. Objectives & boundaries
- Provide deterministic inventory for Windows Server/container images covering MSI/WinSxS assemblies, Chocolatey packages, and registry-derived installers.
- Preserve replayability (layer fragments, provenance metadata) and align outputs with existing SBOM/policy pipelines.
- Respect sovereignty constraints: offline-friendly, signed rule bundles, no reliance on Windows APIs unavailable in containerized scans.
Out of scope (Phase 1):
- Live registry queries on running Windows hosts (requires runtime agent; defer to Zastava/Runtime roadmap).
- Windows Update patch baseline comparison (tracked separately under Runtime/Posture).
- UWP/MSIX packages (flagged for follow-up once MSI parity is complete).
## 2. Architecture overview
```
Scanner.Worker (Windows profile)
├─ Surface.Validation (enforce layer size, path allowlists)
├─ Surface.FS (materialized NTFS image via 7z/guestmount)
├─ MsiCollector -> LayerComponentFragment (windows-msi)
├─ WinSxSCollector -> LayerComponentFragment (windows-winsxs)
├─ ChocolateyCollector -> LayerComponentFragment (windows-choco)
├─ RegistryCollector -> Evidence overlays (uninstall/services)
├─ DriverCapabilityMapper -> Capability overlays (kernel/user drivers)
└─ WindowsComponentMapper -> ComponentGraph + capability metadata
```
- Collectors operate on extracted filesystem snapshots; registry access performed on exported hive files produced during image extraction (document in ops runbooks).
- `WindowsComponentMapper` normalizes component identities (ProductCode, AssemblyIdentity, Chocolatey package ID) and merges overlapping evidence into deterministic fragments.
## 3. Collectors
### 3.1 MSI collector
- Input: `Windows/Installer/*.msi` database files (Jet OLE DB), registry hive exports for product mapping.
- Implementation approach:
- Use open-source MSI parser (custom or MIT-compatible) to avoid COM dependencies.
- Extract Product, Component, File, Feature, Media tables.
- Compute SHA256 for installed files via Component table, linking to WinSxS manifests.
- Output metadata: `productCode`, `upgradeCode`, `productVersion`, `manufacturer`, `language`, `installContext`, `packageCode`, `sourceList`.
- Evidence: file paths with digests, component IDs, CAB/patch references.
### 3.2 WinSxS collector
- Input: `Windows/WinSxS/Manifests/*.manifest`, `Windows/WinSxS/` payload directories, catalog (.cat) files.
- Parse XML assembly identities (name, version, processor architecture, public key token, language).
- Map to MSI components when file hashes match.
- Capture catalog signature thumbprint and optional patch KB references for policy gating.
### 3.3 Chocolatey collector
- Input: `ProgramData/Chocolatey/lib/**`, `ProgramData/Chocolatey/package.backup`, `chocolateyinstall.ps1`, `.nuspec`.
- Extract package ID, version, checksum, source feed, installed files and scripts.
- Note whether install used cache or remote feed; record script hash for determinism.
### 3.4 Registry collector
- Input: Exported `SOFTWARE` hive covering:
- `Microsoft\Windows\CurrentVersion\Uninstall`
- `Microsoft\Windows\CurrentVersion\Installer\UserData`
- `Microsoft\Windows\CurrentVersion\Run` (startup apps)
- Service/driver configuration from `SYSTEM` hive under `Services`.
- Emit fallback evidence for installers not captured by MSI/Chocolatey (legacy EXE installers).
- Record uninstall strings, install dates, publisher, estimated size, install location.
### 3.5 Driver & service mapper
- Parse `SYSTEM` hive `Services` entries to detect drivers (type=1 or 2) and critical services (start mode auto/boot).
- Output capability overlays (e.g., `windows.driver.kernelMode(true)`, `windows.service.autoStart("Spooler")`) for Policy Engine.
## 4. Component mapping & output
- `WindowsComponentMapper`:
- Generate `LayerComponentFragment`s with synthetic layer digests (e.g., `sha256:stellaops-windows-msi`).
- Build `ComponentIdentity` with PURL-like scheme: `pkg:msi/<productCode>` or `pkg:winsxs/<assemblyIdentity>`.
- Include metadata: signature thumbprint, catalog hash, KB references, install context, manufacturer.
- Capability overlays stored under `ScanAnalysisKeys.capability.windows` for policy consumption.
- Export Center bundling:
- Include MSI manifest extracts, WinSxS assembly manifests, Chocolatey nuspec snapshots, and service/driver capability CSV.
## 5. Policy integration
- Predicates to introduce:
- `windows.package.signed(expectedThumbprint?)`
- `windows.package.unsupportedInstallerType`
- `windows.driver.kernelMode`, `windows.driver.unsigned`
- `windows.service.autoStart(name)`
- `windows.choco.sourceAllowed(feed)`
- Lattice approach:
- Unsigned kernel drivers → default `fail`.
- Unknown installer sources → `warn` with escalation on critical services.
- Chocolatey packages from non-whitelisted feeds → configurable severity.
- Waiver semantics bind to product code + signature thumbprint; waivers expire when package version changes.
## 6. Offline kit & distribution
- Package:
- MSI schema definitions and parser binaries (signed).
- Chocolatey feed snapshot (nupkg archives + index) for allow-listed feeds.
- Windows catalog certificate chains + optional CRL/OCSP caches.
- Documentation:
- Provide instructions for exporting registry hives during image extraction (PowerShell script included).
- Note disk space expectations (Chocolatey snapshot size, WinSxS manifest volume).
## 7. Testing strategy
- Fixtures:
- Sample MSI packages (with/without transforms), WinSxS manifests, Chocolatey packages.
- Registry hive exports representing mixed installer types.
- Tests:
- Unit tests for each collector parsing edge cases (language-specific manifests, transforms, script hashing).
- Integration tests using synthetic Windows container image layers (generated via CI on Windows worker).
- Determinism checks ensuring repeated runs produce identical fragments.
- Security review:
- Validate script execution paths (collectors must never execute Chocolatey scripts; inspect only).
## 8. Dependencies & open questions
| Item | Description | Owner | Status |
| --- | --- | --- | --- |
| MSI parser choice | Select MIT/Apache-compatible parser or build internal reader | Scanner Guild | TBD |
| Registry export tooling | Determine standard script/utility for hive exports in container context | Ops Guild | TBD |
| Authenticodes verification locus | Decide scanner vs policy responsibility for signature verification | Security Guild | TBD |
| Feed mirroring policy | Which Chocolatey feeds to mirror by default | Product + Security Guilds | TBD |
## 9. Implementation status
| ID | Title | Status | Notes |
| --- | --- | --- | --- |
| SCANNER-ENG-0024 | Windows MSI collector | **DONE** | `StellaOps.Scanner.Analyzers.OS.Windows.Msi` - OLE compound document parser, extracts Product/File tables, 22 tests passing |
| SCANNER-ENG-0025 | WinSxS manifest collector | **DONE** | `StellaOps.Scanner.Analyzers.OS.Windows.WinSxS` - XML manifest parser, assembly identity extraction, 18 tests passing |
| SCANNER-ENG-0026 | Chocolatey collector | **DONE** | `StellaOps.Scanner.Analyzers.OS.Windows.Chocolatey` - nuspec parser with directory fallback, 44 tests passing |
| SCANNER-ENG-0026 | Registry collector | DEFERRED | Requires exported hive parsing; tracked separately |
| SCANNER-ENG-0027 | Policy predicates | PENDING | Requires Policy module integration (see §5) |
| SCANNER-ENG-0027 | Offline kit packaging | DONE | All analyzers work offline (local file parsing only) |
### Implementation details
**MSI collector** (`windows-msi` analyzer ID):
- Parses MSI database files using OLE compound document signature detection
- Extracts ProductCode, UpgradeCode, ProductName, Manufacturer, ProductVersion
- PURL format: `pkg:generic/windows-msi/{normalized-name}@{version}?upgrade_code={code}`
- Vendor metadata: `msi:product_code`, `msi:upgrade_code`, `msi:manufacturer`, etc.
**WinSxS collector** (`windows-winsxs` analyzer ID):
- Scans `Windows/WinSxS/Manifests/*.manifest` files
- Parses XML assembly identity with multiple namespace support (2006/2009/2016)
- Extracts name, version, architecture, public key token, language, type
- PURL format: `pkg:generic/windows-winsxs/{assembly-name}@{version}?arch={arch}`
- Vendor metadata: `winsxs:name`, `winsxs:version`, `winsxs:public_key_token`, etc.
**Chocolatey collector** (`windows-chocolatey` analyzer ID):
- Scans `ProgramData/Chocolatey/lib/` and `ProgramData/chocolatey/lib/`
- Parses `.nuspec` files with multiple schema namespace support (2010/2011/2015)
- Falls back to directory name parsing when nuspec missing
- Computes SHA256 hash of `chocolateyinstall.ps1` for determinism
- PURL format: `pkg:chocolatey/{package-id}@{version}`
- Vendor metadata: `choco:id`, `choco:authors`, `choco:install_script_hash`, etc.
## 10. References
- `docs/benchmarks/scanner/deep-dives/windows.md`
- `docs/benchmarks/scanner/windows-macos-demand.md`
- `docs/modules/scanner/design/macos-analyzer.md` (structure/composition parallels)
- Surface design docs (`surface-fs.md`, `surface-validation.md`, `surface-secrets.md`) for interfacing expectations.
Further reading: `../../api/scanner/windows-coverage.md` (summary) and `../../api/scanner/windows-macos-summary.md` (metrics dashboard).
Policy readiness alignment: see `../policy/windows-package-readiness.md` (POLICY-READINESS-0002).
Upcoming milestone: FinSecure Corp PCI review requires Authenticode/feed decision by 2025-11-07 before Windows analyzer spike kickoff.