Files
git.stella-ops.org/docs/modules/scanner/README.md
2025-12-25 18:50:33 +02:00

114 lines
6.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# StellaOps Scanner
Scanner analyses container images layer-by-layer, producing deterministic SBOM fragments, diffs, and signed reports.
## Latest updates (2025-12-12)
- Deterministic SBOM composition fixture published at `docs/modules/scanner/fixtures/deterministic-compose/` with DSSE, `_composition.json`, BOM, and hashes; doc `deterministic-sbom-compose.md` promoted to Ready v1.0 with offline verification steps.
- Node analyzer now ingests npm/yarn/pnpm lockfiles, emitting `DeclaredOnly` components with lock provenance. The CLI companion command `stella node lock-validate` runs the collector offline, surfaces declared-only or missing-lock packages, and emits telemetry via `stellaops.cli.node.lock_validate.count`. See `docs/modules/scanner/analyzers-node.md` and bench scenario `node_detection_gaps_fixture`.
- Python analyzer picks up `requirements*.txt`, `Pipfile.lock`, and `poetry.lock`, tagging installed distributions with lock provenance and generating declared-only components for policy. Use `stella python lock-validate` to run the same checks locally before images are built.
- Java analyzer now parses `gradle.lockfile`, `gradle/dependency-locks/**/*.lockfile`, and `pom.xml` dependencies via the new `JavaLockFileCollector`, merging lock metadata onto jar evidence and emitting declared-only components when jars are absent. The new CLI verb `stella java lock-validate` reuses that collector offline (table/JSON output) and records `stellaops.cli.java.lock_validate.count{outcome}` for observability.
- Worker/WebService now resolve cache roots and feature flags via `StellaOps.Scanner.Surface.Env`; misconfiguration warnings are documented in `docs/modules/scanner/design/surface-env.md` and surfaced through startup validation.
- Platform events rollout (2025-10-19) continues to publish scanner.report.ready@1 and scanner.scan.completed@1 envelopes with embedded DSSE payloads (see docs/updates/2025-10-19-scanner-policy.md and docs/updates/2025-10-19-platform-events.md). Service and consumer tests should round-trip the canonical samples under docs/events/samples/.
- OS/non-language analyzers: evidence is rootfs-relative, warnings are structured/capped, hashing is bounded, and Linux OS analyzers support surface-cache reuse. See `os-analyzers-evidence.md`.
## Responsibilities
- Expose APIs (WebService) for scan orchestration, diffing, and artifact retrieval.
- Run Worker analyzers for OS, language, and native ecosystems with restart-only plug-ins.
- Store SBOM fragments and artifacts in RustFS/object storage.
- Publish DSSE-ready metadata for Signer/Attestor and downstream policy evaluation.
## Key components
- `StellaOps.Scanner.WebService` minimal API host.
- `StellaOps.Scanner.Worker` analyzer executor.
- Analyzer libraries under `StellaOps.Scanner.Analyzers.*`.
## Integrations & dependencies
- Scheduler for job intake and retries.
- Policy Engine for evidence handoff.
- Export Center / Offline Kit for artifact packaging.
## Operational notes
- CAS caches, bounded retries, DSSE integration.
- Monitoring dashboards (see ./operations/analyzers-grafana-dashboard.json).
- RustFS migration playbook.
## Related resources
- ./operations/analyzers.md
- ./operations/analyzers-grafana-dashboard.json
- ./operations/rustfs-migration.md
- ./operations/entrypoint.md
- ./analyzers-node.md
- ./analyzers-go.md
- ./operations/secret-leak-detection.md
- ./operations/dsse-rekor-operator-guide.md
- ./os-analyzers-evidence.md
- ./design/macos-analyzer.md
- ./design/windows-analyzer.md
- ../benchmarks/scanner/deep-dives/macos.md
- ../benchmarks/scanner/deep-dives/windows.md
- ../benchmarks/scanner/windows-macos-demand.md
- ../benchmarks/scanner/windows-macos-interview-template.md
- ./operations/field-engagement.md
- ./design/README.md
## Backlog references
- DOCS-SCANNER updates tracked in ../../TASKS.md.
- Analyzer parity work in src/Scanner/**/TASKS.md.
## Implementation Status
### Phase 1 Control plane & job queue (Complete)
- Scanner WebService with queue abstraction (Valkey/NATS)
- Job leasing with retries and dead-letter handling
- CAS layer cache and artifact catalog
- REST API endpoints for scan management
### Phase 2 Analyzer parity & SBOM assembly (In Progress)
- OS analyzers: apk/dpkg/rpm with deterministic metadata
- Language analyzers: Java, Node, Python, Go, .NET, Rust with lock file support
- Native analyzers: ELF/PE/MachO for binary analysis
- SBOM views: inventory/usage with CycloneDX/SPDX emitters
- Entry trace resolution and dependency analysis
### Phase 3 Diff & attestations (In Progress)
- Three-way diff engine (base, target, runtime)
- DSSE SBOM/report signing pipeline
- Attestation hand-off to Signer/Attestor
- Metadata for Export Center integration
### Phase 4 Integrations & exports (Planned)
- Policy Engine integration for evaluation
- Vuln Explorer metadata delivery
- Export Center artifact packaging
- CLI/Console workflows and buildx plugin
### Phase 5 Observability & resilience (Planned)
- Metrics: queue depth, scan latency, cache hit/miss, analyzer timing
- Queue backpressure handling and cache eviction
- SLO dashboards and alerting
- Smoke tests and runbooks
### Key Acceptance Criteria
- Scans produce deterministic SBOM inventory/usage with stable component identity
- Queue/worker pipeline handles retries, backpressure, offline kits
- DSSE attestations exported for Signer/Attestor without transformation
- CLI/Console parity for scan submission, diffing, exports, verification
- Offline scanning supported with local caches and manifest verification
### Technical Decisions & Risks
- Analyzer drift prevented via golden fixtures, hash-based regression tests, deterministic sorting
- Queue overload mitigated with adaptive backpressure, worker scaling, priority lanes
- Storage growth managed via CAS dedupe, ILM policies, offline bundle pruning
- Lock file integration (npm/yarn/pnpm, pip/poetry, gradle) with declared-only components
- Surface cache reuse for Linux OS analyzers with rootfs-relative evidence
### Recent Enhancements (2025-12-12)
- Deterministic SBOM composition with DSSE fixtures and offline verification
- Node/Python/Java lock file collectors with CLI validation commands
- Platform events rollout with scanner.report.ready@1 and scanner.scan.completed@1
- Surface-cache environment resolution with startup validation
## Epic alignment
- **Epic 6 Vulnerability Explorer:** provide policy-aware scan outputs, explain traces, and findings ledger hooks for triage workflows.
- **Epic 10 Export Center:** generate export-ready artefacts, manifests, and DSSE metadata for bundles.