Refactor code structure for improved readability and maintainability; optimize performance in key functions.
This commit is contained in:
@@ -1,20 +1,20 @@
|
||||
# component_architecture_scanner.md — **Stella Ops Scanner** (2025Q4)
|
||||
# component_architecture_scanner.md — **Stella Ops Scanner** (2025Q4)
|
||||
|
||||
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
|
||||
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
> **Scope.** Implementation‑ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & boundaries
|
||||
|
||||
**Mission.** Produce **deterministic**, **explainable** SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: **Inventory** (everything present) and **Usage** (entrypoint closure + actually linked libs). Attach attestations through **Signer→Attestor→Rekor v2**.
|
||||
**Mission.** Produce **deterministic**, **explainable** SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: **Inventory** (everything present) and **Usage** (entrypoint closure + actually linked libs). Attach attestations through **Signer→Attestor→Rekor v2**.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
|
||||
---
|
||||
|
||||
@@ -22,41 +22,41 @@
|
||||
|
||||
```
|
||||
src/
|
||||
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
|
||||
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
|
||||
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
|
||||
├─ StellaOps.Scanner.Storage/ # PostgreSQL repositories; RustFS object client (default) + S3 fallback; ILM/GC
|
||||
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
|
||||
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
|
||||
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
|
||||
├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
|
||||
├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Bun|Python|Go|DotNet|Rust|Ruby|Php]/
|
||||
├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/ # PE/Mach-O planned (M2)
|
||||
├─ StellaOps.Scanner.Symbols.Native/ # NEW – native symbol reader/demangler (Sprint 401)
|
||||
├─ StellaOps.Scanner.CallGraph.Native/ # NEW – function/call-edge builder + CAS emitter
|
||||
├─ StellaOps.Scanner.Emit.CDX/ # CycloneDX (JSON + Protobuf)
|
||||
├─ StellaOps.Scanner.Emit.SPDX/ # SPDX 3.0.1 JSON
|
||||
├─ StellaOps.Scanner.Diff/ # image→layer→component three‑way diff
|
||||
├─ StellaOps.Scanner.Index/ # BOM‑Index sidecar (purls + roaring bitmaps)
|
||||
├─ StellaOps.Scanner.Tests.* # unit/integration/e2e fixtures
|
||||
└─ Tools/
|
||||
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
|
||||
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
|
||||
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
|
||||
├─ StellaOps.Scanner.Storage/ # PostgreSQL repositories; RustFS object client (default) + S3 fallback; ILM/GC
|
||||
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
|
||||
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
|
||||
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
|
||||
├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
|
||||
├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Bun|Python|Go|DotNet|Rust|Ruby|Php]/
|
||||
├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/ # PE/Mach-O planned (M2)
|
||||
├─ StellaOps.Scanner.Symbols.Native/ # NEW – native symbol reader/demangler (Sprint 401)
|
||||
├─ StellaOps.Scanner.CallGraph.Native/ # NEW – function/call-edge builder + CAS emitter
|
||||
├─ StellaOps.Scanner.Emit.CDX/ # CycloneDX (JSON + Protobuf)
|
||||
├─ StellaOps.Scanner.Emit.SPDX/ # SPDX 3.0.1 JSON
|
||||
├─ StellaOps.Scanner.Diff/ # image→layer→component three‑way diff
|
||||
├─ StellaOps.Scanner.Index/ # BOM‑Index sidecar (purls + roaring bitmaps)
|
||||
├─ StellaOps.Scanner.Tests.* # unit/integration/e2e fixtures
|
||||
└─ Tools/
|
||||
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
```
|
||||
|
||||
Per-analyzer notes (language analyzers):
|
||||
- `docs/modules/scanner/analyzers-java.md` — Java/Kotlin (Maven, Gradle, fat archives)
|
||||
- `docs/modules/scanner/dotnet-analyzer.md` — .NET (deps.json, NuGet, packages.lock.json, declared-only)
|
||||
- `docs/modules/scanner/analyzers-python.md` — Python (pip, Poetry, pipenv, conda, editables, vendored)
|
||||
- `docs/modules/scanner/analyzers-node.md` — Node.js (npm, Yarn, pnpm, multi-version locks)
|
||||
- `docs/modules/scanner/analyzers-bun.md` — Bun (bun.lock v1, dev classification, patches)
|
||||
- `docs/modules/scanner/analyzers-go.md` — Go (build info, modules)
|
||||
- `docs/modules/scanner/analyzers-java.md` — Java/Kotlin (Maven, Gradle, fat archives)
|
||||
- `docs/modules/scanner/dotnet-analyzer.md` — .NET (deps.json, NuGet, packages.lock.json, declared-only)
|
||||
- `docs/modules/scanner/analyzers-python.md` — Python (pip, Poetry, pipenv, conda, editables, vendored)
|
||||
- `docs/modules/scanner/analyzers-node.md` — Node.js (npm, Yarn, pnpm, multi-version locks)
|
||||
- `docs/modules/scanner/analyzers-bun.md` — Bun (bun.lock v1, dev classification, patches)
|
||||
- `docs/modules/scanner/analyzers-go.md` — Go (build info, modules)
|
||||
|
||||
Cross-analyzer contract (identity safety, evidence locators, container layout):
|
||||
- `docs/modules/scanner/language-analyzers-contract.md` — PURL vs explicit-key rules, evidence formats, bounded scanning
|
||||
- `docs/modules/scanner/language-analyzers-contract.md` — PURL vs explicit-key rules, evidence formats, bounded scanning
|
||||
|
||||
Semantic entrypoint analysis (Sprint 0411):
|
||||
- `docs/modules/scanner/semantic-entrypoint-schema.md` — Schema for intent, capabilities, threat vectors, and data boundaries
|
||||
- `docs/modules/scanner/semantic-entrypoint-schema.md` — Schema for intent, capabilities, threat vectors, and data boundaries
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
|
||||
@@ -64,15 +64,15 @@ Analyzer assemblies and buildx generators are packaged as **restart-time plug-in
|
||||
|
||||
The **Semantic Entrypoint Engine** enriches scan results with application-level understanding:
|
||||
|
||||
- **Intent Classification** — Infers application type (WebServer, Worker, CliTool, Serverless, etc.) from framework detection and entrypoint analysis
|
||||
- **Capability Detection** — Identifies system resource access patterns (network, filesystem, database, crypto)
|
||||
- **Threat Vector Inference** — Maps capabilities to potential attack vectors with CWE/OWASP references
|
||||
- **Data Boundary Mapping** — Tracks data flow boundaries with sensitivity classification
|
||||
- **Intent Classification** — Infers application type (WebServer, Worker, CliTool, Serverless, etc.) from framework detection and entrypoint analysis
|
||||
- **Capability Detection** — Identifies system resource access patterns (network, filesystem, database, crypto)
|
||||
- **Threat Vector Inference** — Maps capabilities to potential attack vectors with CWE/OWASP references
|
||||
- **Data Boundary Mapping** — Tracks data flow boundaries with sensitivity classification
|
||||
|
||||
Components:
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/` — Core semantic types and orchestrator
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Adapters/` — Language-specific adapters (Python, Java, Node, .NET, Go)
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Analysis/` — Capability detection, threat inference, boundary mapping
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/` — Core semantic types and orchestrator
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Adapters/` — Language-specific adapters (Python, Java, Node, .NET, Go)
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Analysis/` — Capability detection, threat inference, boundary mapping
|
||||
|
||||
Integration points:
|
||||
- `LanguageComponentRecord` includes semantic fields (`intent`, `capabilities[]`, `threatVectors[]`)
|
||||
@@ -88,8 +88,8 @@ CLI usage: `stella scan --semantic <image>` enables semantic analysis in output.
|
||||
- **Build-id capture**: read `.note.gnu.build-id` for every ELF, store hex build-id alongside soname/path, propagate into `SymbolID`/`code_id`, and expose it to SBOM + runtime joiners. If missing, fall back to file hash and mark source accordingly.
|
||||
- **PURL-resolved edges**: annotate call edges with the callee purl and `symbol_digest` so graphs merge with SBOM components. See `docs/reachability/purl-resolved-edges.md` for schema rules and acceptance tests.
|
||||
- **Symbol hints in evidence**: reachability union and richgraph payloads emit `symbol {mangled,demangled,source,confidence}` plus optional `code_block_hash` for stripped/heuristic functions; serializers clamp confidence to [0,1] and uppercase `source` (`DWARF|PDB|SYM|NONE`) for determinism.
|
||||
- **Unknowns emission**: when symbol → purl mapping or edge targets remain unresolved, emit structured Unknowns to Signals (see `docs/signals/unknowns-registry.md`) instead of dropping evidence.
|
||||
- **Hybrid attestation**: emit **graph-level DSSE** for every `richgraph-v1` (mandatory) and optional **edge-bundle DSSE** (≤512 edges) for runtime/init-root/contested edges or third-party provenance. Publish graph DSSE digests to Rekor by default; edge-bundle Rekor publish is policy-driven. CAS layout: `cas://reachability/graphs/{blake3}` for graph body, `.../{blake3}.dsse` for envelope, and `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` for bundles. Deterministic ordering before hashing/signing is required.
|
||||
- **Unknowns emission**: when symbol → purl mapping or edge targets remain unresolved, emit structured Unknowns to Signals (see `docs/signals/unknowns-registry.md`) instead of dropping evidence.
|
||||
- **Hybrid attestation**: emit **graph-level DSSE** for every `richgraph-v1` (mandatory) and optional **edge-bundle DSSE** (≤512 edges) for runtime/init-root/contested edges or third-party provenance. Publish graph DSSE digests to Rekor by default; edge-bundle Rekor publish is policy-driven. CAS layout: `cas://reachability/graphs/{blake3}` for graph body, `.../{blake3}.dsse` for envelope, and `cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse]` for bundles. Deterministic ordering before hashing/signing is required.
|
||||
- **Deterministic call-graph manifest**: capture analyzer versions, feed hashes, toolchain digests, and flags in a manifest stored alongside `richgraph-v1`; replaying with the same manifest MUST yield identical node/edge sets and hashes (see `docs/reachability/lead.md`).
|
||||
|
||||
### 1.1 Queue backbone (Redis / NATS)
|
||||
@@ -121,10 +121,10 @@ scanner:
|
||||
|
||||
The DI extension (`AddScannerQueue`) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
|
||||
|
||||
**Runtime form‑factor:** two deployables
|
||||
**Runtime form‑factor:** two deployables
|
||||
|
||||
* **Scanner.WebService** (stateless REST)
|
||||
* **Scanner.Worker** (N replicas; queue‑driven)
|
||||
* **Scanner.Worker** (N replicas; queue‑driven)
|
||||
|
||||
---
|
||||
|
||||
@@ -134,30 +134,30 @@ The DI extension (`AddScannerQueue`) wires the selected transport, so future add
|
||||
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
|
||||
* **PostgreSQL** for catalog, job state, diffs, ILM rules.
|
||||
* **Queue** (Redis Streams/NATS/RabbitMQ).
|
||||
* **Authority** (on‑prem OIDC) for **OpToks** (DPoP/mTLS).
|
||||
* **Authority** (on‑prem OIDC) for **OpToks** (DPoP/mTLS).
|
||||
* **Signer** + **Attestor** (+ **Fulcio/KMS** + **Rekor v2**) for DSSE + transparency.
|
||||
|
||||
---
|
||||
|
||||
## 3) Contracts & data model
|
||||
|
||||
### 3.1 Evidence‑first component model
|
||||
### 3.1 Evidence‑first component model
|
||||
|
||||
**Nodes**
|
||||
|
||||
* `Image`, `Layer`, `File`
|
||||
* `Component` (`purl?`, `name`, `version?`, `type`, `id` — may be `bin:{sha256}`)
|
||||
* `Executable` (ELF/PE/Mach‑O), `Library` (native or managed), `EntryScript` (shell/launcher)
|
||||
* `Component` (`purl?`, `name`, `version?`, `type`, `id` — may be `bin:{sha256}`)
|
||||
* `Executable` (ELF/PE/Mach‑O), `Library` (native or managed), `EntryScript` (shell/launcher)
|
||||
|
||||
**Edges** (all carry **Evidence**)
|
||||
|
||||
* `contains(Image|Layer → File)`
|
||||
* `installs(PackageDB → Component)` (OS database row)
|
||||
* `declares(InstalledMetadata → Component)` (dist‑info, pom.properties, deps.json…)
|
||||
* `links_to(Executable → Library)` (ELF `DT_NEEDED`, PE imports)
|
||||
* `calls(EntryScript → Program)` (file:line from shell AST)
|
||||
* `attests(Rekor → Component|Image)` (SBOM/predicate binding)
|
||||
* `bound_from_attestation(Component_attested → Component_observed)` (hash equality proof)
|
||||
* `contains(Image|Layer → File)`
|
||||
* `installs(PackageDB → Component)` (OS database row)
|
||||
* `declares(InstalledMetadata → Component)` (dist‑info, pom.properties, deps.json…)
|
||||
* `links_to(Executable → Library)` (ELF `DT_NEEDED`, PE imports)
|
||||
* `calls(EntryScript → Program)` (file:line from shell AST)
|
||||
* `attests(Rekor → Component|Image)` (SBOM/predicate binding)
|
||||
* `bound_from_attestation(Component_attested → Component_observed)` (hash equality proof)
|
||||
|
||||
**Evidence**
|
||||
|
||||
@@ -211,17 +211,20 @@ migrations.
|
||||
All under `/api/v1/scanner`. Auth: **OpTok** (DPoP/mTLS); RBAC scopes.
|
||||
|
||||
```
|
||||
POST /scans { imageRef|digest, force?:bool } → { scanId }
|
||||
GET /scans/{id} → { status, imageDigest, artifacts[], rekor? }
|
||||
GET /sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
|
||||
GET /scans/{id}/ruby-packages → { scanId, imageDigest, generatedAt, packages[] }
|
||||
GET /scans/{id}/bun-packages → { scanId, imageDigest, generatedAt, packages[] }
|
||||
GET /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
|
||||
POST /exports { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
|
||||
POST /reports { imageDigest, policyRevision? } → { reportId, rekor? } # delegates to backend policy+vex
|
||||
GET /catalog/artifacts/{id} → { meta }
|
||||
POST /scans { imageRef|digest, force?:bool } → { scanId }
|
||||
GET /scans/{id} → { status, imageDigest, artifacts[], rekor? }
|
||||
GET /sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
|
||||
POST /sbom/upload { artifactRef, sbom|sbomBase64, format?, source? } -> { sbomId, analysisJobId }
|
||||
GET /sbom/uploads/{sbomId} -> upload record + provenance
|
||||
GET /scans/{id}/ruby-packages → { scanId, imageDigest, generatedAt, packages[] }
|
||||
GET /scans/{id}/bun-packages → { scanId, imageDigest, generatedAt, packages[] }
|
||||
GET /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
|
||||
POST /exports { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
|
||||
POST /reports { imageDigest, policyRevision? } → { reportId, rekor? } # delegates to backend policy+vex
|
||||
GET /catalog/artifacts/{id} → { meta }
|
||||
GET /healthz | /readyz | /metrics
|
||||
```
|
||||
See docs/modules/scanner/byos-ingestion.md for BYOS workflow, formats, and troubleshooting.
|
||||
|
||||
### Report events
|
||||
|
||||
@@ -233,13 +236,13 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
|
||||
|
||||
### 5.1 Acquire & verify
|
||||
|
||||
1. **Resolve image** (prefer `repo@sha256:…`).
|
||||
1. **Resolve image** (prefer `repo@sha256:…`).
|
||||
2. **(Optional) verify image signature** per policy (cosign).
|
||||
3. **Pull blobs**, compute layer digests; record metadata.
|
||||
|
||||
### 5.2 Layer union FS
|
||||
|
||||
* Apply whiteouts; materialize final filesystem; map **file → first introducing layer**.
|
||||
* Apply whiteouts; materialize final filesystem; map **file → first introducing layer**.
|
||||
* Windows layers (MSI/SxS/GAC) planned in **M2**.
|
||||
|
||||
### 5.3 Evidence harvest (parallel analyzers; deterministic only)
|
||||
@@ -259,32 +262,32 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
|
||||
|
||||
**B) Language ecosystems (installed state only)**
|
||||
|
||||
* **Java**: `META-INF/maven/*/pom.properties`, MANIFEST → `pkg:maven/...`
|
||||
* **Node**: `node_modules/**/package.json` → `pkg:npm/...`
|
||||
* **Bun**: `bun.lock` (JSONC text) + `node_modules/**/package.json` + `node_modules/.bun/**/package.json` (isolated linker) → `pkg:npm/...`; `bun.lockb` (binary) emits remediation guidance
|
||||
* **Python**: `*.dist-info/{METADATA,RECORD}` → `pkg:pypi/...`
|
||||
* **Go**: Go **buildinfo** in binaries → `pkg:golang/...`
|
||||
* **.NET**: `*.deps.json` + assembly metadata → `pkg:nuget/...`
|
||||
* **Java**: `META-INF/maven/*/pom.properties`, MANIFEST → `pkg:maven/...`
|
||||
* **Node**: `node_modules/**/package.json` → `pkg:npm/...`
|
||||
* **Bun**: `bun.lock` (JSONC text) + `node_modules/**/package.json` + `node_modules/.bun/**/package.json` (isolated linker) → `pkg:npm/...`; `bun.lockb` (binary) emits remediation guidance
|
||||
* **Python**: `*.dist-info/{METADATA,RECORD}` → `pkg:pypi/...`
|
||||
* **Go**: Go **buildinfo** in binaries → `pkg:golang/...`
|
||||
* **.NET**: `*.deps.json` + assembly metadata → `pkg:nuget/...`
|
||||
* **Rust**: crates only when **explicitly present** (embedded metadata or cargo/registry traces); otherwise binaries reported as `bin:{sha256}`.
|
||||
|
||||
> **Rule:** We only report components proven **on disk** with authoritative metadata. Lockfiles are evidence only.
|
||||
|
||||
**C) Native link graph**
|
||||
|
||||
* **ELF**: parse `PT_INTERP`, `DT_NEEDED`, RPATH/RUNPATH, **GNU symbol versions**; map **SONAMEs** to file paths; link executables → libs.
|
||||
* **PE/Mach‑O** (planned M2): import table, delay‑imports; version resources; code signatures.
|
||||
* **ELF**: parse `PT_INTERP`, `DT_NEEDED`, RPATH/RUNPATH, **GNU symbol versions**; map **SONAMEs** to file paths; link executables → libs.
|
||||
* **PE/Mach‑O** (planned M2): import table, delay‑imports; version resources; code signatures.
|
||||
* Map libs back to **OS packages** if possible (via file lists); else emit `bin:{sha256}` components.
|
||||
* The exported metadata (`stellaops.os.*` properties, license list, source package) feeds policy scoring and export pipelines
|
||||
directly – Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into
|
||||
directly – Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into
|
||||
downstream JSON/Trivy payloads.
|
||||
* **Reachability lattice**: analyzers + runtime probes emit `Evidence`/`Mitigation` records (see `docs/reachability/lattice.md`). The lattice engine joins static path evidence, runtime hits (EventPipe/JFR), taint flows, environment gates, and mitigations into `ReachDecision` documents that feed VEX gating and event graph storage.
|
||||
* Sprint 401 introduces `StellaOps.Scanner.Symbols.Native` (DWARF/PDB reader + demangler) and `StellaOps.Scanner.CallGraph.Native`
|
||||
* Sprint 401 introduces `StellaOps.Scanner.Symbols.Native` (DWARF/PDB reader + demangler) and `StellaOps.Scanner.CallGraph.Native`
|
||||
(function boundary detector + call-edge builder). These libraries feed `FuncNode`/`CallEdge` CAS bundles and enrich reachability
|
||||
graphs with `{code_id, confidence, evidence}` so Signals/Policy/UI can cite function-level justifications.
|
||||
|
||||
**D) EntryTrace (ENTRYPOINT/CMD → terminal program)**
|
||||
**D) EntryTrace (ENTRYPOINT/CMD → terminal program)**
|
||||
|
||||
* Read image config; parse shell (POSIX/Bash subset) with AST: `source`/`.` includes; `case/if`; `exec`/`command`; `run‑parts`.
|
||||
* Read image config; parse shell (POSIX/Bash subset) with AST: `source`/`.` includes; `case/if`; `exec`/`command`; `run‑parts`.
|
||||
* Resolve commands via **PATH** within the **built rootfs**; follow language launchers (Java/Node/Python) to identify the terminal program (ELF/JAR/venv script).
|
||||
* Record **file:line** and choices for each hop; output chain graph.
|
||||
* Unresolvable dynamic constructs are recorded as **unknown** edges with reasons (e.g., `$FOO` unresolved).
|
||||
@@ -293,11 +296,11 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
|
||||
|
||||
Post-resolution, the `SemanticEntrypointOrchestrator` enriches entry trace results with semantic understanding:
|
||||
|
||||
* **Application Intent** — Infers the purpose (WebServer, CliTool, Worker, Serverless, BatchJob, etc.) from framework detection and command patterns.
|
||||
* **Capability Classes** — Detects capabilities (NetworkListen, DatabaseSql, ProcessSpawn, SecretAccess, etc.) via import/dependency analysis and framework signatures.
|
||||
* **Attack Surface** — Maps capabilities to potential threat vectors (SqlInjection, Xss, Ssrf, Rce, PathTraversal) with CWE IDs and OWASP Top 10 categories.
|
||||
* **Data Boundaries** — Traces I/O edges (HttpRequest, DatabaseQuery, FileInput, EnvironmentVar) with direction and sensitivity classification.
|
||||
* **Confidence Scoring** — Each inference carries a score (0.0–1.0), tier (Definitive/High/Medium/Low/Unknown), and reasoning chain.
|
||||
* **Application Intent** — Infers the purpose (WebServer, CliTool, Worker, Serverless, BatchJob, etc.) from framework detection and command patterns.
|
||||
* **Capability Classes** — Detects capabilities (NetworkListen, DatabaseSql, ProcessSpawn, SecretAccess, etc.) via import/dependency analysis and framework signatures.
|
||||
* **Attack Surface** — Maps capabilities to potential threat vectors (SqlInjection, Xss, Ssrf, Rce, PathTraversal) with CWE IDs and OWASP Top 10 categories.
|
||||
* **Data Boundaries** — Traces I/O edges (HttpRequest, DatabaseQuery, FileInput, EnvironmentVar) with direction and sensitivity classification.
|
||||
* **Confidence Scoring** — Each inference carries a score (0.0–1.0), tier (Definitive/High/Medium/Low/Unknown), and reasoning chain.
|
||||
|
||||
Language-specific adapters (`PythonSemanticAdapter`, `JavaSemanticAdapter`, `NodeSemanticAdapter`, `DotNetSemanticAdapter`, `GoSemanticAdapter`) recognize framework patterns:
|
||||
* **Python**: Django, Flask, FastAPI, Celery, Click/Typer, Lambda handlers
|
||||
@@ -316,7 +319,7 @@ See `docs/modules/scanner/operations/entrypoint-semantic.md` for full schema ref
|
||||
**E) Attestation & SBOM bind (optional)**
|
||||
|
||||
* For each **file hash** or **binary hash**, query local cache of **Rekor v2** indices; if an SBOM attestation is found for **exact hash**, bind it to the component (origin=`attested`).
|
||||
* For the **image** digest, likewise bind SBOM attestations (build‑time referrers).
|
||||
* For the **image** digest, likewise bind SBOM attestations (build‑time referrers).
|
||||
|
||||
### 5.4 Component normalization (exact only)
|
||||
|
||||
@@ -326,25 +329,25 @@ See `docs/modules/scanner/operations/entrypoint-semantic.md` for full schema ref
|
||||
### 5.5 SBOM assembly & emit
|
||||
|
||||
* **Per-layer SBOM fragments**: components introduced by the layer (+ relationships).
|
||||
* **Image SBOMs**: merge fragments; refer back to them via **CycloneDX BOM‑Link** (or SPDX ExternalRef).
|
||||
* **Image SBOMs**: merge fragments; refer back to them via **CycloneDX BOM‑Link** (or SPDX ExternalRef).
|
||||
* Emit both **Inventory** & **Usage** views.
|
||||
* When the native analyzer reports an ELF `buildId`, attach it to component metadata and surface it as `stellaops:buildId` in CycloneDX properties (and diff metadata). This keeps SBOM/diff output in lockstep with runtime events and the debug-store manifest.
|
||||
* Serialize **CycloneDX JSON** and **CycloneDX Protobuf**; optionally **SPDX 3.0.1 JSON**.
|
||||
* Build **BOM‑Index** sidecar: purl table + roaring bitmap; flag `usedByEntrypoint` components for fast backend joins.
|
||||
* Serialize **CycloneDX 1.7 JSON** and **CycloneDX 1.7 Protobuf**; optionally **SPDX 3.0.1 JSON-LD** (`application/spdx+json; version=3.0.1`) with legacy tag-value output (`text/spdx`) when enabled (1.6 accepted for ingest compatibility).
|
||||
* Build **BOM‑Index** sidecar: purl table + roaring bitmap; flag `usedByEntrypoint` components for fast backend joins.
|
||||
|
||||
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
|
||||
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
|
||||
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
|
||||
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.
|
||||
* **Attestor** logs to **Rekor v2**; returns `{uuid,index,proof}` → stored in `artifacts.rekor`.
|
||||
* **Attestor** logs to **Rekor v2**; returns `{uuid,index,proof}` → stored in `artifacts.rekor`.
|
||||
* **Hybrid reachability attestations**: graph-level DSSE (mandatory) plus optional edge-bundle DSSEs for runtime/init/contested edges. See [`docs/reachability/hybrid-attestation.md`](../../reachability/hybrid-attestation.md) for verification runbooks and Rekor guidance.
|
||||
* Operator enablement runbooks (toggles, env-var map, rollout guidance) live in [`operations/dsse-rekor-operator-guide.md`](operations/dsse-rekor-operator-guide.md) per SCANNER-ENG-0015.
|
||||
|
||||
---
|
||||
|
||||
## 6) Three‑way diff (image → layer → component)
|
||||
## 6) Three‑way diff (image → layer → component)
|
||||
|
||||
### 6.1 Keys & classification
|
||||
|
||||
@@ -360,7 +363,7 @@ B = components(imageNew, key)
|
||||
|
||||
added = B \ A
|
||||
removed = A \ B
|
||||
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }
|
||||
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }
|
||||
|
||||
for each item in added/removed/changed:
|
||||
layer = attribute_to_layer(item, imageOld|imageNew)
|
||||
@@ -372,13 +375,13 @@ Diffs are stored as artifacts and feed **UI** and **CLI**.
|
||||
|
||||
---
|
||||
|
||||
## 7) Build‑time SBOMs (fast CI path)
|
||||
## 7) Build‑time SBOMs (fast CI path)
|
||||
|
||||
**Scanner.Sbomer.BuildXPlugin** can act as a BuildKit **generator**:
|
||||
|
||||
* During `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, run analyzers on the build context/output; attach SBOMs as OCI **referrers** to the built image.
|
||||
* Optionally request **Signer/Attestor** to produce **Stella Ops‑verified** attestation immediately; else, Scanner.WebService can verify and re‑attest post‑push.
|
||||
* Scanner.WebService trusts build‑time SBOMs per policy, enabling **no‑rescan** for unchanged bases.
|
||||
* Optionally request **Signer/Attestor** to produce **Stella Ops‑verified** attestation immediately; else, Scanner.WebService can verify and re‑attest post‑push.
|
||||
* Scanner.WebService trusts build‑time SBOMs per policy, enabling **no‑rescan** for unchanged bases.
|
||||
|
||||
---
|
||||
|
||||
@@ -420,26 +423,26 @@ scanner:
|
||||
|
||||
## 9) Scale & performance
|
||||
|
||||
* **Parallelism**: per‑analyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.
|
||||
* **Parallelism**: per‑analyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.
|
||||
* **Distributed locks** per **layer digest** to prevent duplicate work across Workers.
|
||||
* **Registry throttles**: per‑host concurrency budgets; exponential backoff on 429/5xx.
|
||||
* **Registry throttles**: per‑host concurrency budgets; exponential backoff on 429/5xx.
|
||||
* **Targets**:
|
||||
|
||||
* **Build‑time**: P95 ≤ 3–5 s on warmed bases (CI generator).
|
||||
* **Post‑build delta**: P95 ≤ 10 s for 200 MB images with cache hit.
|
||||
* **Emit**: CycloneDX Protobuf ≤ 150 ms for 5k components; JSON ≤ 500 ms.
|
||||
* **Diff**: ≤ 200 ms for 5k vs 5k components.
|
||||
* **Build‑time**: P95 ≤ 3–5 s on warmed bases (CI generator).
|
||||
* **Post‑build delta**: P95 ≤ 10 s for 200 MB images with cache hit.
|
||||
* **Emit**: CycloneDX Protobuf ≤ 150 ms for 5k components; JSON ≤ 500 ms.
|
||||
* **Diff**: ≤ 200 ms for 5k vs 5k components.
|
||||
|
||||
---
|
||||
|
||||
## 10) Security posture
|
||||
|
||||
* **AuthN**: Authority‑issued short OpToks (DPoP/mTLS).
|
||||
* **AuthN**: Authority‑issued short OpToks (DPoP/mTLS).
|
||||
* **AuthZ**: scopes (`scanner.scan`, `scanner.export`, `scanner.catalog.read`).
|
||||
* **mTLS** to **Signer**/**Attestor**; only **Signer** can sign.
|
||||
* **No network fetches** during analysis (except registry pulls and optional Rekor index reads).
|
||||
* **Sandboxing**: non‑root containers; read‑only FS; seccomp profiles; disable execution of scanned content.
|
||||
* **Release integrity**: all first‑party images are **cosign‑signed**; Workers/WebService self‑verify at startup.
|
||||
* **Sandboxing**: non‑root containers; read‑only FS; seccomp profiles; disable execution of scanned content.
|
||||
* **Release integrity**: all first‑party images are **cosign‑signed**; Workers/WebService self‑verify at startup.
|
||||
|
||||
---
|
||||
|
||||
@@ -451,8 +454,8 @@ scanner:
|
||||
* `scanner.layer_cache_hits_total`, `scanner.file_cas_hits_total`
|
||||
* `scanner.artifact_bytes_total{format}`
|
||||
* `scanner.attestation_latency_seconds`, `scanner.rekor_failures_total`
|
||||
* `scanner_analyzer_golang_heuristic_total{indicator,version_hint}` — increments whenever the Go analyzer falls back to heuristics (build-id or runtime markers). Grafana panel: `sum by (indicator) (rate(scanner_analyzer_golang_heuristic_total[5m]))`; alert when the rate is ≥ 1 for 15 minutes to highlight unexpected stripped binaries.
|
||||
* **Tracing**: spans for acquire→union→analyzers→compose→emit→sign→log.
|
||||
* `scanner_analyzer_golang_heuristic_total{indicator,version_hint}` — increments whenever the Go analyzer falls back to heuristics (build-id or runtime markers). Grafana panel: `sum by (indicator) (rate(scanner_analyzer_golang_heuristic_total[5m]))`; alert when the rate is ≥ 1 for 15 minutes to highlight unexpected stripped binaries.
|
||||
* **Tracing**: spans for acquire→union→analyzers→compose→emit→sign→log.
|
||||
* **Audit logs**: DSSE requests log `license_id`, `image_digest`, `artifactSha256`, `policy_digest?`, Rekor UUID on success.
|
||||
|
||||
---
|
||||
@@ -461,12 +464,12 @@ scanner:
|
||||
|
||||
* **Analyzer contracts:** see `language-analyzers-contract.md` for cross-analyzer identity safety, evidence locators, and container layout rules. Per-analyzer docs: `analyzers-java.md`, `dotnet-analyzer.md`, `analyzers-python.md`, `analyzers-node.md`, `analyzers-bun.md`, `analyzers-go.md`. Implementation: `docs/implplan/SPRINT_0408_0001_0001_scanner_language_detection_gaps_program.md`.
|
||||
|
||||
* **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
|
||||
* **OS packages:** ground‑truth images per distro; compare to package DB.
|
||||
* **Lang ecosystems:** sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfile‑only.
|
||||
* **Native & EntryTrace:** ELF graph correctness; shell AST cases (includes, run‑parts, exec, case/if).
|
||||
* **Diff:** layer attribution against synthetic two‑image sequences.
|
||||
* **Performance:** cold vs warm cache; large `node_modules` and `site‑packages`.
|
||||
* **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
|
||||
* **OS packages:** ground‑truth images per distro; compare to package DB.
|
||||
* **Lang ecosystems:** sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfile‑only.
|
||||
* **Native & EntryTrace:** ELF graph correctness; shell AST cases (includes, run‑parts, exec, case/if).
|
||||
* **Diff:** layer attribution against synthetic two‑image sequences.
|
||||
* **Performance:** cold vs warm cache; large `node_modules` and `site‑packages`.
|
||||
* **Security:** ensure no code execution from image; fuzz parser inputs; path traversal resistance on layer extract.
|
||||
|
||||
---
|
||||
@@ -474,16 +477,16 @@ scanner:
|
||||
## 13) Failure modes & degradations
|
||||
|
||||
* **Missing OS DB** (files exist, DB removed): record **files**; do **not** fabricate package components; emit `bin:{sha256}` where unavoidable; flag in evidence.
|
||||
* **Unreadable metadata** (corrupt dist‑info): record file evidence; skip component creation; annotate.
|
||||
* **Unreadable metadata** (corrupt dist‑info): record file evidence; skip component creation; annotate.
|
||||
* **Dynamic shell constructs**: mark unresolved edges with reasons (env var unknown) and continue; **Usage** view may be partial.
|
||||
* **Registry rate limits**: honor backoff; queue job retries with jitter.
|
||||
* **Signer refusal** (license/plan/version): scan completes; artifact produced; **no attestation**; WebService marks result as **unverified**.
|
||||
|
||||
---
|
||||
|
||||
## 14) Optional plug‑ins (off by default)
|
||||
## 14) Optional plug‑ins (off by default)
|
||||
|
||||
* **Patch‑presence detector** (signature‑based backport checks). Reads curated function‑level signatures from advisories; inspects binaries for patched code snippets to lower false‑positives for backported fixes. Runs as a sidecar analyzer that **annotates** components; never overrides core identities.
|
||||
* **Patch‑presence detector** (signature‑based backport checks). Reads curated function‑level signatures from advisories; inspects binaries for patched code snippets to lower false‑positives for backported fixes. Runs as a sidecar analyzer that **annotates** components; never overrides core identities.
|
||||
* **Runtime probes** (with Zastava): when allowed, compare **/proc/<pid>/maps** (DSOs actually loaded) with static **Usage** view for precision.
|
||||
|
||||
---
|
||||
@@ -506,14 +509,14 @@ scanner:
|
||||
|
||||
## 17) Roadmap (Scanner)
|
||||
|
||||
* **M2**: Windows containers (MSI/SxS/GAC analyzers), PE/Mach‑O native analyzer, deeper Rust metadata.
|
||||
* **M2**: Buildx generator GA (certified external registries), cross‑registry trust policies.
|
||||
* **M3**: Patch‑presence plug‑in GA (opt‑in), cross‑image corpus clustering (evidence‑only; not identity).
|
||||
* **M2**: Windows containers (MSI/SxS/GAC analyzers), PE/Mach‑O native analyzer, deeper Rust metadata.
|
||||
* **M2**: Buildx generator GA (certified external registries), cross‑registry trust policies.
|
||||
* **M3**: Patch‑presence plug‑in GA (opt‑in), cross‑image corpus clustering (evidence‑only; not identity).
|
||||
* **M3**: Advanced EntryTrace (POSIX shell features breadth, busybox detection).
|
||||
|
||||
---
|
||||
|
||||
### Appendix A — EntryTrace resolution (pseudo)
|
||||
### Appendix A — EntryTrace resolution (pseudo)
|
||||
|
||||
```csharp
|
||||
ResolveEntrypoint(ImageConfig cfg, RootFs fs):
|
||||
@@ -544,9 +547,9 @@ ResolveEntrypoint(ImageConfig cfg, RootFs fs):
|
||||
return Unknown(reason)
|
||||
```
|
||||
|
||||
### Appendix A.1 — EntryTrace Explainability
|
||||
### Appendix A.1 — EntryTrace Explainability
|
||||
|
||||
### Appendix A.0 — Replay / Record mode
|
||||
### Appendix A.0 — Replay / Record mode
|
||||
|
||||
- WebService ships a **RecordModeService** that assembles replay manifests (schema v1) with policy/feed/tool pins and reachability references, then writes deterministic input/output bundles to the configured object store (RustFS default, S3/Minio fallback) under `replay/<head>/<digest>.tar.zst`.
|
||||
- Bundles contain canonical manifest JSON plus inputs (policy/feed/tool/analyzer digests) and outputs (SBOM, findings, optional VEX/logs); CAS URIs follow `cas://replay/...` and are attached to scan snapshots as `ReplayArtifacts`.
|
||||
@@ -567,12 +570,12 @@ EntryTrace emits structured diagnostics and metrics so operators can quickly und
|
||||
|
||||
Diagnostics drive two metrics published by `EntryTraceMetrics`:
|
||||
|
||||
- `entrytrace_resolutions_total{outcome}` — resolution attempts segmented by outcome (`resolved`, `partiallyresolved`, `unresolved`).
|
||||
- `entrytrace_unresolved_total{reason}` — diagnostic counts keyed by reason.
|
||||
- `entrytrace_resolutions_total{outcome}` — resolution attempts segmented by outcome (`resolved`, `partiallyresolved`, `unresolved`).
|
||||
- `entrytrace_unresolved_total{reason}` — diagnostic counts keyed by reason.
|
||||
|
||||
Structured logs include `entrytrace.path`, `entrytrace.command`, `entrytrace.reason`, and `entrytrace.depth`, all correlated with scan/job IDs. Timestamps are normalized to UTC (microsecond precision) to keep DSSE attestations and UI traces explainable.
|
||||
|
||||
### Appendix B — BOM‑Index sidecar
|
||||
### Appendix B — BOM‑Index sidecar
|
||||
|
||||
```
|
||||
struct Header { magic, version, imageDigest, createdAt }
|
||||
|
||||
Reference in New Issue
Block a user