Files
git.stella-ops.org/docs/modules/scanner/design/windows-analyzer.md
StellaOps Bot d040c001ac
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
up
2025-11-28 19:23:54 +02:00

9.7 KiB

Windows Analyzer Design Brief (Draft)

Owners: Scanner Guild, Policy Guild, Offline Kit Guild, Security Guild
Related backlog (proposed): SCANNER-ENG-0024..0027, DOCS-SCANNER-BENCH-62-002
Status: Draft — contingent on Windows demand threshold (see docs/benchmarks/scanner/windows-macos-demand.md)

1. Objectives & boundaries

  • Provide deterministic inventory for Windows Server/container images covering MSI/WinSxS assemblies, Chocolatey packages, and registry-derived installers.
  • Preserve replayability (layer fragments, provenance metadata) and align outputs with existing SBOM/policy pipelines.
  • Respect sovereignty constraints: offline-friendly, signed rule bundles, no reliance on Windows APIs unavailable in containerized scans.

Out of scope (Phase 1):

  • Live registry queries on running Windows hosts (requires runtime agent; defer to Zastava/Runtime roadmap).
  • Windows Update patch baseline comparison (tracked separately under Runtime/Posture).
  • UWP/MSIX packages (flagged for follow-up once MSI parity is complete).

2. Architecture overview

Scanner.Worker (Windows profile)
 ├─ Surface.Validation (enforce layer size, path allowlists)
 ├─ Surface.FS (materialized NTFS image via 7z/guestmount)
 ├─ MsiCollector              -> LayerComponentFragment (windows-msi)
 ├─ WinSxSCollector           -> LayerComponentFragment (windows-winsxs)
 ├─ ChocolateyCollector       -> LayerComponentFragment (windows-choco)
 ├─ RegistryCollector         -> Evidence overlays (uninstall/services)
 ├─ DriverCapabilityMapper    -> Capability overlays (kernel/user drivers)
 └─ WindowsComponentMapper    -> ComponentGraph + capability metadata
  • Collectors operate on extracted filesystem snapshots; registry access performed on exported hive files produced during image extraction (document in ops runbooks).
  • WindowsComponentMapper normalizes component identities (ProductCode, AssemblyIdentity, Chocolatey package ID) and merges overlapping evidence into deterministic fragments.

3. Collectors

3.1 MSI collector

  • Input: Windows/Installer/*.msi database files (Jet OLE DB), registry hive exports for product mapping.
  • Implementation approach:
    • Use open-source MSI parser (custom or MIT-compatible) to avoid COM dependencies.
    • Extract Product, Component, File, Feature, Media tables.
    • Compute SHA256 for installed files via Component table, linking to WinSxS manifests.
  • Output metadata: productCode, upgradeCode, productVersion, manufacturer, language, installContext, packageCode, sourceList.
  • Evidence: file paths with digests, component IDs, CAB/patch references.

3.2 WinSxS collector

  • Input: Windows/WinSxS/Manifests/*.manifest, Windows/WinSxS/ payload directories, catalog (.cat) files.
  • Parse XML assembly identities (name, version, processor architecture, public key token, language).
  • Map to MSI components when file hashes match.
  • Capture catalog signature thumbprint and optional patch KB references for policy gating.

3.3 Chocolatey collector

  • Input: ProgramData/Chocolatey/lib/**, ProgramData/Chocolatey/package.backup, chocolateyinstall.ps1, .nuspec.
  • Extract package ID, version, checksum, source feed, installed files and scripts.
  • Note whether install used cache or remote feed; record script hash for determinism.

3.4 Registry collector

  • Input: Exported SOFTWARE hive covering:
    • Microsoft\Windows\CurrentVersion\Uninstall
    • Microsoft\Windows\CurrentVersion\Installer\UserData
    • Microsoft\Windows\CurrentVersion\Run (startup apps)
    • Service/driver configuration from SYSTEM hive under Services.
  • Emit fallback evidence for installers not captured by MSI/Chocolatey (legacy EXE installers).
  • Record uninstall strings, install dates, publisher, estimated size, install location.

3.5 Driver & service mapper

  • Parse SYSTEM hive Services entries to detect drivers (type=1 or 2) and critical services (start mode auto/boot).
  • Output capability overlays (e.g., windows.driver.kernelMode(true), windows.service.autoStart("Spooler")) for Policy Engine.

4. Component mapping & output

  • WindowsComponentMapper:
    • Generate LayerComponentFragments with synthetic layer digests (e.g., sha256:stellaops-windows-msi).
    • Build ComponentIdentity with PURL-like scheme: pkg:msi/<productCode> or pkg:winsxs/<assemblyIdentity>.
    • Include metadata: signature thumbprint, catalog hash, KB references, install context, manufacturer.
  • Capability overlays stored under ScanAnalysisKeys.capability.windows for policy consumption.
  • Export Center bundling:
    • Include MSI manifest extracts, WinSxS assembly manifests, Chocolatey nuspec snapshots, and service/driver capability CSV.

5. Policy integration

  • Predicates to introduce:
    • windows.package.signed(expectedThumbprint?)
    • windows.package.unsupportedInstallerType
    • windows.driver.kernelMode, windows.driver.unsigned
    • windows.service.autoStart(name)
    • windows.choco.sourceAllowed(feed)
  • Lattice approach:
    • Unsigned kernel drivers → default fail.
    • Unknown installer sources → warn with escalation on critical services.
    • Chocolatey packages from non-whitelisted feeds → configurable severity.
  • Waiver semantics bind to product code + signature thumbprint; waivers expire when package version changes.

6. Offline kit & distribution

  • Package:
    • MSI schema definitions and parser binaries (signed).
    • Chocolatey feed snapshot (nupkg archives + index) for allow-listed feeds.
    • Windows catalog certificate chains + optional CRL/OCSP caches.
  • Documentation:
    • Provide instructions for exporting registry hives during image extraction (PowerShell script included).
    • Note disk space expectations (Chocolatey snapshot size, WinSxS manifest volume).

7. Testing strategy

  • Fixtures:
    • Sample MSI packages (with/without transforms), WinSxS manifests, Chocolatey packages.
    • Registry hive exports representing mixed installer types.
  • Tests:
    • Unit tests for each collector parsing edge cases (language-specific manifests, transforms, script hashing).
    • Integration tests using synthetic Windows container image layers (generated via CI on Windows worker).
    • Determinism checks ensuring repeated runs produce identical fragments.
  • Security review:
    • Validate script execution paths (collectors must never execute Chocolatey scripts; inspect only).

8. Dependencies & open questions

Item Description Owner Status
MSI parser choice Select MIT/Apache-compatible parser or build internal reader Scanner Guild TBD
Registry export tooling Determine standard script/utility for hive exports in container context Ops Guild TBD
Authenticodes verification locus Decide scanner vs policy responsibility for signature verification Security Guild TBD
Feed mirroring policy Which Chocolatey feeds to mirror by default Product + Security Guilds TBD

9. Implementation status

ID Title Status Notes
SCANNER-ENG-0024 Windows MSI collector DONE StellaOps.Scanner.Analyzers.OS.Windows.Msi - OLE compound document parser, extracts Product/File tables, 22 tests passing
SCANNER-ENG-0025 WinSxS manifest collector DONE StellaOps.Scanner.Analyzers.OS.Windows.WinSxS - XML manifest parser, assembly identity extraction, 18 tests passing
SCANNER-ENG-0026 Chocolatey collector DONE StellaOps.Scanner.Analyzers.OS.Windows.Chocolatey - nuspec parser with directory fallback, 44 tests passing
SCANNER-ENG-0026 Registry collector DEFERRED Requires exported hive parsing; tracked separately
SCANNER-ENG-0027 Policy predicates PENDING Requires Policy module integration (see §5)
SCANNER-ENG-0027 Offline kit packaging DONE All analyzers work offline (local file parsing only)

Implementation details

MSI collector (windows-msi analyzer ID):

  • Parses MSI database files using OLE compound document signature detection
  • Extracts ProductCode, UpgradeCode, ProductName, Manufacturer, ProductVersion
  • PURL format: pkg:generic/windows-msi/{normalized-name}@{version}?upgrade_code={code}
  • Vendor metadata: msi:product_code, msi:upgrade_code, msi:manufacturer, etc.

WinSxS collector (windows-winsxs analyzer ID):

  • Scans Windows/WinSxS/Manifests/*.manifest files
  • Parses XML assembly identity with multiple namespace support (2006/2009/2016)
  • Extracts name, version, architecture, public key token, language, type
  • PURL format: pkg:generic/windows-winsxs/{assembly-name}@{version}?arch={arch}
  • Vendor metadata: winsxs:name, winsxs:version, winsxs:public_key_token, etc.

Chocolatey collector (windows-chocolatey analyzer ID):

  • Scans ProgramData/Chocolatey/lib/ and ProgramData/chocolatey/lib/
  • Parses .nuspec files with multiple schema namespace support (2010/2011/2015)
  • Falls back to directory name parsing when nuspec missing
  • Computes SHA256 hash of chocolateyinstall.ps1 for determinism
  • PURL format: pkg:chocolatey/{package-id}@{version}
  • Vendor metadata: choco:id, choco:authors, choco:install_script_hash, etc.

10. References

  • docs/benchmarks/scanner/deep-dives/windows.md
  • docs/benchmarks/scanner/windows-macos-demand.md
  • docs/modules/scanner/design/macos-analyzer.md (structure/composition parallels)
  • Surface design docs (surface-fs.md, surface-validation.md, surface-secrets.md) for interfacing expectations.

Further reading: ../../api/scanner/windows-coverage.md (summary) and ../../api/scanner/windows-macos-summary.md (metrics dashboard).

Policy readiness alignment: see ../policy/windows-package-readiness.md (POLICY-READINESS-0002).

Upcoming milestone: FinSecure Corp PCI review requires Authenticode/feed decision by 2025-11-07 before Windows analyzer spike kickoff.