up

2025-12-13 18:08:55 +02:00
parent 6e45066e37
commit f1a39c4ce3
234 changed files with 24038 additions and 6910 deletions
--- a/docs/modules/scanner/architecture.md
+++ b/docs/modules/scanner/architecture.md
@@ -42,14 +42,44 @@ src/
 └─ Tools/
     ├─ StellaOps.Scanner.Sbomer.BuildXPlugin/   # BuildKit generator (image referrer SBOMs)
     └─ StellaOps.Scanner.Sbomer.DockerImage/    # CLI‑driven scanner container
-```
-
-Per-analyzer notes (language analyzers):
- `docs/modules/scanner/analyzers-java.md`
- `docs/modules/scanner/analyzers-bun.md`
- `docs/modules/scanner/analyzers-python.md`
-
-Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
+```
+
+Per-analyzer notes (language analyzers):
+- `docs/modules/scanner/analyzers-java.md` — Java/Kotlin (Maven, Gradle, fat archives)
+- `docs/modules/scanner/dotnet-analyzer.md` — .NET (deps.json, NuGet, packages.lock.json, declared-only)
+- `docs/modules/scanner/analyzers-python.md` — Python (pip, Poetry, pipenv, conda, editables, vendored)
+- `docs/modules/scanner/analyzers-node.md` — Node.js (npm, Yarn, pnpm, multi-version locks)
+- `docs/modules/scanner/analyzers-bun.md` — Bun (bun.lock v1, dev classification, patches)
+- `docs/modules/scanner/analyzers-go.md` — Go (build info, modules)
+
+Cross-analyzer contract (identity safety, evidence locators, container layout):
+- `docs/modules/scanner/language-analyzers-contract.md` — PURL vs explicit-key rules, evidence formats, bounded scanning
+
+Semantic entrypoint analysis (Sprint 0411):
+- `docs/modules/scanner/semantic-entrypoint-schema.md` — Schema for intent, capabilities, threat vectors, and data boundaries
+
+Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
+
+### 1.3 Semantic Entrypoint Engine (Sprint 0411)
+
+The **Semantic Entrypoint Engine** enriches scan results with application-level understanding:
+
+- **Intent Classification** — Infers application type (WebServer, Worker, CliTool, Serverless, etc.) from framework detection and entrypoint analysis
+- **Capability Detection** — Identifies system resource access patterns (network, filesystem, database, crypto)
+- **Threat Vector Inference** — Maps capabilities to potential attack vectors with CWE/OWASP references
+- **Data Boundary Mapping** — Tracks data flow boundaries with sensitivity classification
+
+Components:
+- `StellaOps.Scanner.EntryTrace/Semantic/` — Core semantic types and orchestrator
+- `StellaOps.Scanner.EntryTrace/Semantic/Adapters/` — Language-specific adapters (Python, Java, Node, .NET, Go)
+- `StellaOps.Scanner.EntryTrace/Semantic/Analysis/` — Capability detection, threat inference, boundary mapping
+
+Integration points:
+- `LanguageComponentRecord` includes semantic fields (`intent`, `capabilities[]`, `threatVectors[]`)
+- `richgraph-v1` nodes carry semantic attributes via `semantic_*` keys
+- CycloneDX/SPDX SBOMs include `stellaops:semantic.*` property extensions
+
+CLI usage: `stella scan --semantic <image>` enables semantic analysis in output.

 ### 1.2 Native reachability upgrades (Nov 2026)

@@ -259,6 +289,30 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
 * Record **file:line** and choices for each hop; output chain graph.
 * Unresolvable dynamic constructs are recorded as **unknown** edges with reasons (e.g., `$FOO` unresolved).

+**D.1) Semantic Entrypoint Analysis (Sprint 0411)**
+
+Post-resolution, the `SemanticEntrypointOrchestrator` enriches entry trace results with semantic understanding:
+
+* **Application Intent** — Infers the purpose (WebServer, CliTool, Worker, Serverless, BatchJob, etc.) from framework detection and command patterns.
+* **Capability Classes** — Detects capabilities (NetworkListen, DatabaseSql, ProcessSpawn, SecretAccess, etc.) via import/dependency analysis and framework signatures.
+* **Attack Surface** — Maps capabilities to potential threat vectors (SqlInjection, Xss, Ssrf, Rce, PathTraversal) with CWE IDs and OWASP Top 10 categories.
+* **Data Boundaries** — Traces I/O edges (HttpRequest, DatabaseQuery, FileInput, EnvironmentVar) with direction and sensitivity classification.
+* **Confidence Scoring** — Each inference carries a score (0.0–1.0), tier (Definitive/High/Medium/Low/Unknown), and reasoning chain.
+
+Language-specific adapters (`PythonSemanticAdapter`, `JavaSemanticAdapter`, `NodeSemanticAdapter`, `DotNetSemanticAdapter`, `GoSemanticAdapter`) recognize framework patterns:
+* **Python**: Django, Flask, FastAPI, Celery, Click/Typer, Lambda handlers
+* **Java**: Spring Boot, Quarkus, Micronaut, Kafka Streams
+* **Node**: Express, NestJS, Fastify, CLI bin entries
+* **.NET**: ASP.NET Core, Worker services, Azure Functions
+* **Go**: net/http, Cobra, gRPC
+
+Semantic data flows into:
+* **RichGraph nodes** via `semantic_intent`, `semantic_capabilities`, `semantic_threats` attributes
+* **CycloneDX properties** via `stellaops:semantic.*` namespace
+* **LanguageComponentRecord** metadata for reachability scoring
+
+See `docs/modules/scanner/operations/entrypoint-semantic.md` for full schema reference.
+
 **E) Attestation & SBOM bind (optional)**

 * For each **file hash** or **binary hash**, query local cache of **Rekor v2** indices; if an SBOM attestation is found for **exact hash**, bind it to the component (origin=`attested`).
@@ -402,9 +456,9 @@ scanner:

 ---

-## 12) Testing matrix
-
-* **Analyzer contracts:** see `language-analyzers-contract.md` and per-analyzer docs (e.g., `analyzers-java.md`, Sprint 0403).
+## 12) Testing matrix
+
+* **Analyzer contracts:** see `language-analyzers-contract.md` for cross-analyzer identity safety, evidence locators, and container layout rules. Per-analyzer docs: `analyzers-java.md`, `dotnet-analyzer.md`, `analyzers-python.md`, `analyzers-node.md`, `analyzers-bun.md`, `analyzers-go.md`. Implementation: `docs/implplan/SPRINT_0408_0001_0001_scanner_language_detection_gaps_program.md`.

 * **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
 * **OS packages:** ground‑truth images per distro; compare to package DB.