up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
This commit is contained in:
@@ -22,7 +22,7 @@
|
||||
2. Capture the test output (`ttl-validation-<timestamp>.log`) and attach it to the sprint evidence folder (`docs/modules/attestor/evidence/`).
|
||||
|
||||
## Result handling
|
||||
- **Success:** Tests complete in ~3–4 minutes with `Total tests: 2, Passed: 2`. Store the log and note the run in `SPRINT_100_identity_signing.md` under ATTESTOR-72-003.
|
||||
- **Success:** Tests complete in ~3–4 minutes with `Total tests: 2, Passed: 2`. Store the log and note the run in `docs/implplan/archived/SPRINT_0100_0001_0001_identity_signing.md` under ATTESTOR-72-003.
|
||||
- **Failure:** Preserve:
|
||||
- `docker compose logs` for both services.
|
||||
- `mongosh` output of `db.dedupe.getIndexes()` and sample documents.
|
||||
|
||||
@@ -38,6 +38,7 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
|
||||
- ./operations/rustfs-migration.md
|
||||
- ./operations/entrypoint.md
|
||||
- ./analyzers-node.md
|
||||
- ./analyzers-go.md
|
||||
- ./operations/secret-leak-detection.md
|
||||
- ./operations/dsse-rekor-operator-guide.md
|
||||
- ./os-analyzers-evidence.md
|
||||
|
||||
115
docs/modules/scanner/analyzers-go.md
Normal file
115
docs/modules/scanner/analyzers-go.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Go Analyzer (Scanner)
|
||||
|
||||
## What it does
|
||||
- Inventories Go components from **binaries** (embedded buildinfo) and **source** (go.mod/go.sum/go.work/vendor) without executing `go`.
|
||||
- Emits `pkg:golang/<module>@<version>` when a concrete version is available; otherwise emits deterministic explicit-key components (no "range-as-version" PURLs).
|
||||
- Records VCS/build metadata and bounded evidence for audit/replay; remains offline-first.
|
||||
- Detects security-relevant capabilities in Go source code (exec, filesystem, network, native code, etc.).
|
||||
|
||||
## Inputs and precedence
|
||||
The analyzer processes inputs in the following order, with binary evidence taking precedence:
|
||||
|
||||
1. **Binary inventory (Phase 1, authoritative)**: Extract embedded build info (`runtime/debug` buildinfo blob) and emit Go modules (main + deps) with concrete versions and build settings evidence. Binary-derived components include `provenance=binary` metadata.
|
||||
2. **Source inventory (Phase 2, supplementary)**: Parse `go.mod`, `go.sum`, `go.work`, and `vendor/modules.txt` to emit modules not already covered by binary evidence. Source-derived components include `provenance=source` metadata.
|
||||
3. **Heuristic fallback (stripped binaries)**: When buildinfo is missing, emit deterministic `bin` components keyed by sha256 plus minimal classification evidence.
|
||||
|
||||
**Precedence rules:**
|
||||
- Binary evidence is scanned first and takes precedence over source evidence.
|
||||
- When both source and binary evidence exist for the same module path@version, only the binary-derived component is emitted.
|
||||
- Main modules are tracked separately: if a binary emits `module@version`, source `module@(devel)` is suppressed.
|
||||
- This ensures deterministic, non-duplicative output.
|
||||
|
||||
## Project discovery (modules + workspaces)
|
||||
- Standalone modules are discovered by locating `go.mod` files (bounded recursion depth 10; vendor directories skipped).
|
||||
- Workspaces are discovered via `go.work` at the analysis root; `use` members become additional module roots.
|
||||
- Vendored dependencies are detected via `vendor/modules.txt` when present.
|
||||
|
||||
## Workspace replace directive propagation
|
||||
`go.work` files may contain `replace` directives that apply to all workspace members:
|
||||
- Workspace-level replaces are inherited by all member modules.
|
||||
- Module-level replaces take precedence over workspace-level replaces for the same module path.
|
||||
- Duplicate replace keys are handled deterministically (last-one-wins within each scope).
|
||||
|
||||
## Identity rules (PURL vs explicit key)
|
||||
Concrete versions emit a PURL:
|
||||
- `purl = pkg:golang/<modulePath>@<version>`
|
||||
|
||||
Non-concrete identities emit an explicit key:
|
||||
- Used for source-only main modules (`(devel)`) and for any non-versioned module identity.
|
||||
- PURL is omitted (`purl=null`) and the component is keyed deterministically via `AddFromExplicitKey`.
|
||||
|
||||
## Evidence and metadata
|
||||
|
||||
### Binary-derived components
|
||||
Binary components include (when present):
|
||||
- `provenance=binary`
|
||||
- `go.version`
|
||||
- `modulePath.main` and `build.*` settings
|
||||
- VCS fields (`build.vcs*` from build settings and/or `go.dwarf` tokens)
|
||||
- `moduleSum` and replacement metadata when available
|
||||
- CGO signals (`cgo.enabled`, flags, compiler hints; plus adjacent native libs when detected)
|
||||
|
||||
### Source-derived components
|
||||
Source components include:
|
||||
- `provenance=source`
|
||||
- `moduleSum` from `go.sum` (when present)
|
||||
- vendor signals (`vendored=true`) and `vendor` evidence locators
|
||||
- replacement/exclude flags with stable metadata keys
|
||||
- best-effort license signals for main module and vendored modules
|
||||
- `capabilities` metadata listing detected capability kinds (exec, filesystem, network, etc.)
|
||||
- `capabilities.maxRisk` indicating highest risk level (critical/high/medium/low)
|
||||
|
||||
### Heuristic fallback components
|
||||
Fallback components include:
|
||||
- `type=bin`, deterministic `sha256` identity, and a classification evidence marker
|
||||
- Metric `scanner_analyzer_golang_heuristic_total{indicator,version_hint}` increments per heuristic emission
|
||||
|
||||
## Capability scanning
|
||||
The analyzer detects security-relevant capabilities in Go source code:
|
||||
|
||||
| Capability | Risk | Examples |
|
||||
|------------|------|----------|
|
||||
| Exec | Critical | `exec.Command`, `syscall.Exec`, `os.StartProcess` |
|
||||
| NativeCode | Critical | `unsafe.Pointer`, `//go:linkname`, `syscall.Syscall` |
|
||||
| PluginLoading | Critical | `plugin.Open` |
|
||||
| Filesystem | High/Medium | `os.Remove`, `os.Chmod`, `os.WriteFile` |
|
||||
| Network | Medium | `net.Dial`, `http.Get`, `http.ListenAndServe` |
|
||||
| Environment | High/Medium | `os.Setenv`, `os.Getenv` |
|
||||
| Database | Medium | `sql.Open`, `db.Query` |
|
||||
| DynamicCode | High | `reflect.Value.Call`, `template.Execute` |
|
||||
| Serialization | Medium | `gob.NewDecoder`, `xml.Unmarshal` |
|
||||
| Reflection | Low/Medium | `reflect.TypeOf`, `reflect.New` |
|
||||
| Crypto | Low | Hash functions, cipher operations |
|
||||
|
||||
Capabilities are emitted as:
|
||||
- Metadata: `capabilities=exec,filesystem,network` (comma-separated list of kinds)
|
||||
- Metadata: `capabilities.maxRisk=critical|high|medium|low`
|
||||
- Evidence: Top 10 capability locations with pattern and line number
|
||||
|
||||
## IO/Memory bounds
|
||||
Binary and DWARF scanning uses bounded windowed reads to limit memory usage:
|
||||
- **Build info scanning**: 16 MB windows with 4 KB overlap; max file size 128 MB.
|
||||
- **DWARF token scanning**: 8 MB windows with 1 KB overlap; max file size 256 MB.
|
||||
- Small files (below window size) are read directly for efficiency.
|
||||
|
||||
## Retract semantics
|
||||
Go's `retract` directive only applies to versions of the declaring module itself, not to dependencies:
|
||||
- The `RetractedVersions` field in inventory results contains only versions of the main module that are retracted.
|
||||
- Dependency retraction cannot be determined offline (would require fetching each module's go.mod).
|
||||
- No false-positive retraction warnings are emitted for dependencies.
|
||||
|
||||
## Cache key correctness
|
||||
Binary build info is cached using a composite key:
|
||||
- File path (normalized for OS case sensitivity)
|
||||
- File length
|
||||
- Last modification time
|
||||
- 4 KB header hash (FNV-1a)
|
||||
|
||||
The header hash ensures correct behavior in containerized/layered filesystem environments where files may have identical metadata but different content.
|
||||
|
||||
## References
|
||||
- Sprint: `docs/implplan/SPRINT_0402_0001_0001_scanner_go_analyzer_gaps.md`
|
||||
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
|
||||
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/GoLanguageAnalyzer.cs`
|
||||
- Capability scanner: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/Internal/GoCapabilityScanner.cs`
|
||||
|
||||
@@ -42,14 +42,44 @@ src/
|
||||
└─ Tools/
|
||||
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
```
|
||||
|
||||
Per-analyzer notes (language analyzers):
|
||||
- `docs/modules/scanner/analyzers-java.md`
|
||||
- `docs/modules/scanner/analyzers-bun.md`
|
||||
- `docs/modules/scanner/analyzers-python.md`
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
```
|
||||
|
||||
Per-analyzer notes (language analyzers):
|
||||
- `docs/modules/scanner/analyzers-java.md` — Java/Kotlin (Maven, Gradle, fat archives)
|
||||
- `docs/modules/scanner/dotnet-analyzer.md` — .NET (deps.json, NuGet, packages.lock.json, declared-only)
|
||||
- `docs/modules/scanner/analyzers-python.md` — Python (pip, Poetry, pipenv, conda, editables, vendored)
|
||||
- `docs/modules/scanner/analyzers-node.md` — Node.js (npm, Yarn, pnpm, multi-version locks)
|
||||
- `docs/modules/scanner/analyzers-bun.md` — Bun (bun.lock v1, dev classification, patches)
|
||||
- `docs/modules/scanner/analyzers-go.md` — Go (build info, modules)
|
||||
|
||||
Cross-analyzer contract (identity safety, evidence locators, container layout):
|
||||
- `docs/modules/scanner/language-analyzers-contract.md` — PURL vs explicit-key rules, evidence formats, bounded scanning
|
||||
|
||||
Semantic entrypoint analysis (Sprint 0411):
|
||||
- `docs/modules/scanner/semantic-entrypoint-schema.md` — Schema for intent, capabilities, threat vectors, and data boundaries
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
|
||||
### 1.3 Semantic Entrypoint Engine (Sprint 0411)
|
||||
|
||||
The **Semantic Entrypoint Engine** enriches scan results with application-level understanding:
|
||||
|
||||
- **Intent Classification** — Infers application type (WebServer, Worker, CliTool, Serverless, etc.) from framework detection and entrypoint analysis
|
||||
- **Capability Detection** — Identifies system resource access patterns (network, filesystem, database, crypto)
|
||||
- **Threat Vector Inference** — Maps capabilities to potential attack vectors with CWE/OWASP references
|
||||
- **Data Boundary Mapping** — Tracks data flow boundaries with sensitivity classification
|
||||
|
||||
Components:
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/` — Core semantic types and orchestrator
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Adapters/` — Language-specific adapters (Python, Java, Node, .NET, Go)
|
||||
- `StellaOps.Scanner.EntryTrace/Semantic/Analysis/` — Capability detection, threat inference, boundary mapping
|
||||
|
||||
Integration points:
|
||||
- `LanguageComponentRecord` includes semantic fields (`intent`, `capabilities[]`, `threatVectors[]`)
|
||||
- `richgraph-v1` nodes carry semantic attributes via `semantic_*` keys
|
||||
- CycloneDX/SPDX SBOMs include `stellaops:semantic.*` property extensions
|
||||
|
||||
CLI usage: `stella scan --semantic <image>` enables semantic analysis in output.
|
||||
|
||||
### 1.2 Native reachability upgrades (Nov 2026)
|
||||
|
||||
@@ -259,6 +289,30 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
|
||||
* Record **file:line** and choices for each hop; output chain graph.
|
||||
* Unresolvable dynamic constructs are recorded as **unknown** edges with reasons (e.g., `$FOO` unresolved).
|
||||
|
||||
**D.1) Semantic Entrypoint Analysis (Sprint 0411)**
|
||||
|
||||
Post-resolution, the `SemanticEntrypointOrchestrator` enriches entry trace results with semantic understanding:
|
||||
|
||||
* **Application Intent** — Infers the purpose (WebServer, CliTool, Worker, Serverless, BatchJob, etc.) from framework detection and command patterns.
|
||||
* **Capability Classes** — Detects capabilities (NetworkListen, DatabaseSql, ProcessSpawn, SecretAccess, etc.) via import/dependency analysis and framework signatures.
|
||||
* **Attack Surface** — Maps capabilities to potential threat vectors (SqlInjection, Xss, Ssrf, Rce, PathTraversal) with CWE IDs and OWASP Top 10 categories.
|
||||
* **Data Boundaries** — Traces I/O edges (HttpRequest, DatabaseQuery, FileInput, EnvironmentVar) with direction and sensitivity classification.
|
||||
* **Confidence Scoring** — Each inference carries a score (0.0–1.0), tier (Definitive/High/Medium/Low/Unknown), and reasoning chain.
|
||||
|
||||
Language-specific adapters (`PythonSemanticAdapter`, `JavaSemanticAdapter`, `NodeSemanticAdapter`, `DotNetSemanticAdapter`, `GoSemanticAdapter`) recognize framework patterns:
|
||||
* **Python**: Django, Flask, FastAPI, Celery, Click/Typer, Lambda handlers
|
||||
* **Java**: Spring Boot, Quarkus, Micronaut, Kafka Streams
|
||||
* **Node**: Express, NestJS, Fastify, CLI bin entries
|
||||
* **.NET**: ASP.NET Core, Worker services, Azure Functions
|
||||
* **Go**: net/http, Cobra, gRPC
|
||||
|
||||
Semantic data flows into:
|
||||
* **RichGraph nodes** via `semantic_intent`, `semantic_capabilities`, `semantic_threats` attributes
|
||||
* **CycloneDX properties** via `stellaops:semantic.*` namespace
|
||||
* **LanguageComponentRecord** metadata for reachability scoring
|
||||
|
||||
See `docs/modules/scanner/operations/entrypoint-semantic.md` for full schema reference.
|
||||
|
||||
**E) Attestation & SBOM bind (optional)**
|
||||
|
||||
* For each **file hash** or **binary hash**, query local cache of **Rekor v2** indices; if an SBOM attestation is found for **exact hash**, bind it to the component (origin=`attested`).
|
||||
@@ -402,9 +456,9 @@ scanner:
|
||||
|
||||
---
|
||||
|
||||
## 12) Testing matrix
|
||||
|
||||
* **Analyzer contracts:** see `language-analyzers-contract.md` and per-analyzer docs (e.g., `analyzers-java.md`, Sprint 0403).
|
||||
## 12) Testing matrix
|
||||
|
||||
* **Analyzer contracts:** see `language-analyzers-contract.md` for cross-analyzer identity safety, evidence locators, and container layout rules. Per-analyzer docs: `analyzers-java.md`, `dotnet-analyzer.md`, `analyzers-python.md`, `analyzers-node.md`, `analyzers-bun.md`, `analyzers-go.md`. Implementation: `docs/implplan/SPRINT_0408_0001_0001_scanner_language_detection_gaps_program.md`.
|
||||
|
||||
* **Determinism:** given same image + analyzers → byte‑identical **CDX Protobuf**; JSON normalized.
|
||||
* **OS packages:** ground‑truth images per distro; compare to package DB.
|
||||
|
||||
149
docs/modules/scanner/dotnet-analyzer.md
Normal file
149
docs/modules/scanner/dotnet-analyzer.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# .NET Analyzer
|
||||
|
||||
The .NET analyzer detects NuGet package dependencies in .NET applications by analyzing multiple dependency sources with defined precedence rules.
|
||||
|
||||
## Detection Sources and Precedence
|
||||
|
||||
The analyzer uses the following sources in order of precedence (highest to lowest fidelity):
|
||||
|
||||
| Priority | Source | Description |
|
||||
|----------|--------|-------------|
|
||||
| 1 | `packages.lock.json` | Locked resolved versions; highest trust for version accuracy |
|
||||
| 2 | `*.deps.json` | Installed/published packages; authoritative for "what shipped" |
|
||||
| 3 | SDK-style project files | `*.csproj/*.fsproj/*.vbproj` + `Directory.Packages.props` (CPM) + `Directory.Build.props` |
|
||||
| 4 | `packages.config` | Legacy format; lowest precedence |
|
||||
|
||||
## Operating Modes
|
||||
|
||||
### Installed Mode (deps.json present)
|
||||
|
||||
When `*.deps.json` files exist, the analyzer operates in **installed mode**:
|
||||
|
||||
- Installed packages are emitted with `pkg:nuget/<id>@<ver>` PURLs
|
||||
- Declared packages not matching any installed package are emitted with `declaredOnly=true` and `installed.missing=true`
|
||||
- Installed packages without corresponding declared records are tagged with `declared.missing=true`
|
||||
|
||||
### Declared-Only Mode (no deps.json)
|
||||
|
||||
When no `*.deps.json` files exist, the analyzer falls back to **declared-only mode**:
|
||||
|
||||
- Dependencies are collected from declared sources in precedence order
|
||||
- All packages are emitted with `declaredOnly=true`
|
||||
- Resolved versions use `pkg:nuget/<id>@<ver>` PURLs
|
||||
- Unresolved versions use explicit keys (see below)
|
||||
|
||||
## Declared-Only Components
|
||||
|
||||
Components emitted from declared sources include these metadata fields:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `declaredOnly` | Always `"true"` for declared-only components |
|
||||
| `declared.source` | Source file type (e.g., `csproj`, `packages.lock.json`, `packages.config`) |
|
||||
| `declared.locator` | Relative path to source file |
|
||||
| `declared.versionSource` | How version was determined: `direct`, `centralpkg`, `lockfile`, `property`, `unresolved` |
|
||||
| `declared.tfm[N]` | Target framework(s) |
|
||||
| `declared.isDevelopmentDependency` | `"true"` if marked as development dependency |
|
||||
| `provenance` | `"declared"` for declared-only components |
|
||||
|
||||
## Unresolved Version Identity
|
||||
|
||||
When a version cannot be resolved (e.g., CPM enabled but missing version, unresolved property placeholder), the component uses an explicit key format:
|
||||
|
||||
```
|
||||
declared:nuget/<normalized-id>/<version-source-hash>
|
||||
```
|
||||
|
||||
Where `version-source-hash` = first 8 characters of SHA-256(`<source>|<locators>|<raw-version-string>`)
|
||||
|
||||
Additional metadata for unresolved versions:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `declared.versionResolved` | `"false"` |
|
||||
| `declared.unresolvedReason` | One of: `cpm-missing`, `property-unresolved`, `version-omitted` |
|
||||
| `declared.rawVersion` | Original unresolved string (e.g., `$(SerilogVersion)`) |
|
||||
|
||||
This explicit key format prevents collisions with real `pkg:nuget/<id>@<ver>` PURLs.
|
||||
|
||||
## Bundling Detection
|
||||
|
||||
The analyzer detects bundled executables (single-file apps, ILMerge/ILRepack assemblies) using bounded candidate selection:
|
||||
|
||||
### Candidate Selection Rules
|
||||
|
||||
- Only scan files in the **same directory** as `*.deps.json` or `*.runtimeconfig.json`
|
||||
- Only scan files with executable extensions: `.exe`, `.dll`, or no extension
|
||||
- Only scan files named matching the app name (e.g., if `MyApp.deps.json` exists, check `MyApp`, `MyApp.exe`, `MyApp.dll`)
|
||||
- Skip files > 500 MB (emit `bundle.skipped=true` with `bundle.skipReason=size-exceeded`)
|
||||
|
||||
### Bundling Metadata
|
||||
|
||||
When bundling is detected, metadata is attached to entrypoint components (or synthetic bundle markers):
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `bundle.detected` | `"true"` |
|
||||
| `bundle.filePath` | Relative path to bundled executable |
|
||||
| `bundle.kind` | `singlefile`, `ilmerge`, `ilrepack`, `costurafody`, `unknown` |
|
||||
| `bundle.sizeBytes` | File size in bytes |
|
||||
| `bundle.estimatedAssemblies` | Estimated number of bundled assemblies |
|
||||
| `bundle.indicator[N]` | Detection indicators (top 5) |
|
||||
| `bundle.skipped` | `"true"` if file was skipped |
|
||||
| `bundle.skipReason` | Reason for skipping (e.g., `size-exceeded`) |
|
||||
|
||||
## Dependency Edges
|
||||
|
||||
When `emitDependencyEdges=true` is set in the analyzer configuration (`dotnet-il.config.json`), the analyzer emits dependency edge metadata for both installed and declared packages.
|
||||
|
||||
### Edge Metadata Format
|
||||
|
||||
Each edge is emitted with the following metadata fields:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `edge[N].target` | Normalized package ID of the dependency |
|
||||
| `edge[N].reason` | Relationship type (e.g., `declared-dependency`) |
|
||||
| `edge[N].confidence` | Confidence level (`high`, `medium`, `low`) |
|
||||
| `edge[N].source` | Source of the edge information (`deps.json`, `packages.lock.json`) |
|
||||
|
||||
### Edge Sources
|
||||
|
||||
- **`deps.json`**: Dependencies from the runtime dependencies section
|
||||
- **`packages.lock.json`**: Dependencies from the lock file's per-package dependencies
|
||||
|
||||
### Example Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"emitDependencyEdges": true
|
||||
}
|
||||
```
|
||||
|
||||
## Central Package Management (CPM)
|
||||
|
||||
The analyzer supports .NET CPM via `Directory.Packages.props`:
|
||||
|
||||
1. When `ManagePackageVersionsCentrally=true` in the project or props file
|
||||
2. Package versions are resolved from `<PackageVersion>` items in `Directory.Packages.props`
|
||||
3. If a package version cannot be found in CPM, it's marked as unresolved with `declared.unresolvedReason=cpm-missing`
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **No full MSBuild evaluation**: The analyzer uses lightweight XML parsing, not MSBuild evaluation. Complex conditions and imports may not be fully resolved.
|
||||
|
||||
2. **No restore/feed access**: The analyzer does not perform NuGet restore or access package feeds. Only locally available information is used.
|
||||
|
||||
3. **Property resolution**: Property placeholders (`$(PropertyName)`) are resolved using `Directory.Build.props` and project properties, but transitive or complex property evaluation is not supported.
|
||||
|
||||
4. **Bundled content**: Bundling detection identifies likely bundles but cannot extract embedded dependency information.
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.DotNet/DotNetLanguageAnalyzer.cs`
|
||||
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.DotNet/Internal/DotNetDeclaredDependencyCollector.cs`
|
||||
- `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.DotNet/Internal/Bundling/DotNetBundlingSignalCollector.cs`
|
||||
|
||||
## Related Sprint
|
||||
|
||||
See [SPRINT_0404_0001_0001_scanner_dotnet_detection_gaps.md](../../implplan/SPRINT_0404_0001_0001_scanner_dotnet_detection_gaps.md) for implementation details and decisions.
|
||||
280
docs/modules/scanner/operations/entrypoint-semantic.md
Normal file
280
docs/modules/scanner/operations/entrypoint-semantic.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Semantic Entrypoint Analysis
|
||||
|
||||
> Part of Sprint 0411 - Semantic Entrypoint Engine
|
||||
|
||||
## Overview
|
||||
|
||||
The Semantic Entrypoint Engine provides deep understanding of container entrypoints by inferring:
|
||||
- **Application Intent** - What the application is designed to do (web server, CLI tool, worker, etc.)
|
||||
- **Capabilities** - What system resources and external services the application uses
|
||||
- **Attack Surface** - Potential security vulnerabilities based on detected patterns
|
||||
- **Data Boundaries** - I/O edges where data enters or leaves the application
|
||||
|
||||
This semantic layer enables more accurate vulnerability prioritization, reachability analysis, and policy decisioning.
|
||||
|
||||
## Schema Definition
|
||||
|
||||
### SemanticEntrypoint Record
|
||||
|
||||
The core output of semantic analysis:
|
||||
|
||||
```csharp
|
||||
public sealed record SemanticEntrypoint
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required EntrypointSpecification Specification { get; init; }
|
||||
public required ApplicationIntent Intent { get; init; }
|
||||
public required CapabilityClass Capabilities { get; init; }
|
||||
public required ImmutableArray<ThreatVector> AttackSurface { get; init; }
|
||||
public required ImmutableArray<DataFlowBoundary> DataBoundaries { get; init; }
|
||||
public required SemanticConfidence Confidence { get; init; }
|
||||
public string? Language { get; init; }
|
||||
public string? Framework { get; init; }
|
||||
public string? FrameworkVersion { get; init; }
|
||||
public string? RuntimeVersion { get; init; }
|
||||
public ImmutableDictionary<string, string>? Metadata { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Application Intent
|
||||
|
||||
Enumeration of recognized application types:
|
||||
|
||||
| Intent | Description | Example Frameworks |
|
||||
|--------|-------------|-------------------|
|
||||
| `WebServer` | HTTP/HTTPS listener | Django, Express, ASP.NET Core |
|
||||
| `CliTool` | Command-line utility | Click, Cobra, System.CommandLine |
|
||||
| `Worker` | Background job processor | Celery, Sidekiq, Hangfire |
|
||||
| `BatchJob` | One-shot data processing | MapReduce, ETL scripts |
|
||||
| `Serverless` | FaaS handler | Lambda, Azure Functions |
|
||||
| `Daemon` | Long-running background service | systemd units |
|
||||
| `StreamProcessor` | Real-time data pipeline | Kafka Streams, Flink |
|
||||
| `RpcServer` | gRPC/Thrift server | grpc-go, grpc-dotnet |
|
||||
| `GraphQlServer` | GraphQL API | Apollo, Hot Chocolate |
|
||||
| `DatabaseServer` | Database engine | PostgreSQL, Redis |
|
||||
| `MessageBroker` | Message queue server | RabbitMQ, NATS |
|
||||
| `CacheServer` | Cache/session store | Redis, Memcached |
|
||||
| `ProxyGateway` | Reverse proxy, API gateway | Envoy, NGINX |
|
||||
|
||||
### Capability Classes
|
||||
|
||||
Flags enum representing detected capabilities:
|
||||
|
||||
| Capability | Description | Detection Signals |
|
||||
|------------|-------------|-------------------|
|
||||
| `NetworkListen` | Opens listening socket | `http.ListenAndServe`, `app.listen()` |
|
||||
| `NetworkConnect` | Makes outbound connections | `requests`, `http.Client` |
|
||||
| `FileRead` | Reads from filesystem | `open()`, `File.ReadAllText()` |
|
||||
| `FileWrite` | Writes to filesystem | File write operations |
|
||||
| `ProcessSpawn` | Spawns child processes | `subprocess`, `exec.Command` |
|
||||
| `DatabaseSql` | SQL database access | `psycopg2`, `SqlConnection` |
|
||||
| `DatabaseNoSql` | NoSQL database access | `pymongo`, `redis` |
|
||||
| `MessageQueue` | Message broker client | `pika`, `kafka-python` |
|
||||
| `CacheAccess` | Cache client operations | `redis`, `memcached` |
|
||||
| `ExternalHttpApi` | External HTTP API calls | REST clients |
|
||||
| `Authentication` | Auth operations | `passport`, `JWT` libraries |
|
||||
| `SecretAccess` | Accesses secrets/credentials | Vault clients, env secrets |
|
||||
|
||||
### Threat Vectors
|
||||
|
||||
Inferred security threats:
|
||||
|
||||
| Threat Type | CWE ID | OWASP Category | Contributing Capabilities |
|
||||
|------------|--------|----------------|--------------------------|
|
||||
| `SqlInjection` | 89 | A03:2021 | `DatabaseSql` + `UserInput` |
|
||||
| `Xss` | 79 | A03:2021 | `NetworkListen` + `UserInput` |
|
||||
| `Ssrf` | 918 | A10:2021 | `ExternalHttpApi` + `UserInput` |
|
||||
| `Rce` | 94 | A03:2021 | `ProcessSpawn` + `UserInput` |
|
||||
| `PathTraversal` | 22 | A01:2021 | `FileRead` + `UserInput` |
|
||||
| `InsecureDeserialization` | 502 | A08:2021 | Deserialization patterns |
|
||||
| `AuthenticationBypass` | 287 | A07:2021 | Auth patterns detected |
|
||||
| `CommandInjection` | 78 | A03:2021 | `ProcessSpawn` patterns |
|
||||
|
||||
### Data Flow Boundaries
|
||||
|
||||
I/O edges for data flow analysis:
|
||||
|
||||
| Boundary Type | Direction | Security Relevance |
|
||||
|---------------|-----------|-------------------|
|
||||
| `HttpRequest` | Inbound | User input entry point |
|
||||
| `HttpResponse` | Outbound | Data exposure point |
|
||||
| `DatabaseQuery` | Outbound | SQL injection surface |
|
||||
| `FileInput` | Inbound | Path traversal surface |
|
||||
| `EnvironmentVar` | Inbound | Config injection surface |
|
||||
| `MessageReceive` | Inbound | Deserialization surface |
|
||||
| `ProcessSpawn` | Outbound | Command injection surface |
|
||||
|
||||
### Confidence Scoring
|
||||
|
||||
All inferences include confidence scores:
|
||||
|
||||
```csharp
|
||||
public sealed record SemanticConfidence
|
||||
{
|
||||
public double Score { get; init; } // 0.0-1.0
|
||||
public ConfidenceTier Tier { get; init; } // Unknown, Low, Medium, High, Definitive
|
||||
public ImmutableArray<string> ReasoningChain { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
| Tier | Score Range | Description |
|
||||
|------|-------------|-------------|
|
||||
| `Definitive` | 0.95-1.0 | Framework explicitly declared |
|
||||
| `High` | 0.8-0.95 | Strong pattern match |
|
||||
| `Medium` | 0.5-0.8 | Multiple weak signals |
|
||||
| `Low` | 0.2-0.5 | Heuristic inference |
|
||||
| `Unknown` | 0.0-0.2 | No reliable signals |
|
||||
|
||||
## Language Adapters
|
||||
|
||||
Semantic analysis uses language-specific adapters:
|
||||
|
||||
### Python Adapter
|
||||
- **Django**: Detects `manage.py`, `INSTALLED_APPS`, migrations
|
||||
- **Flask/FastAPI**: Detects `Flask(__name__)`, `FastAPI()` patterns
|
||||
- **Celery**: Detects `Celery()` app, `@task` decorators
|
||||
- **Click/Typer**: Detects CLI decorators
|
||||
- **Lambda**: Detects `lambda_handler` pattern
|
||||
|
||||
### Java Adapter
|
||||
- **Spring Boot**: Detects `@SpringBootApplication`, starter dependencies
|
||||
- **Quarkus**: Detects `io.quarkus` packages
|
||||
- **Kafka Streams**: Detects `kafka-streams` dependency
|
||||
- **Main-Class**: Falls back to manifest analysis
|
||||
|
||||
### Node Adapter
|
||||
- **Express**: Detects `express()` + `listen()`
|
||||
- **NestJS**: Detects `@nestjs/core` dependency
|
||||
- **Fastify**: Detects `fastify()` patterns
|
||||
- **CLI bin**: Detects `bin` field in package.json
|
||||
|
||||
### .NET Adapter
|
||||
- **ASP.NET Core**: Detects `Microsoft.AspNetCore` references
|
||||
- **Worker Service**: Detects `BackgroundService` inheritance
|
||||
- **Console**: Detects `OutputType=Exe` without web deps
|
||||
|
||||
### Go Adapter
|
||||
- **net/http**: Detects `http.ListenAndServe` patterns
|
||||
- **Cobra**: Detects `github.com/spf13/cobra` import
|
||||
- **gRPC**: Detects `google.golang.org/grpc` import
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Entry Trace Pipeline
|
||||
|
||||
Semantic analysis integrates after entry trace resolution:
|
||||
|
||||
```
|
||||
Container Image
|
||||
↓
|
||||
EntryTraceAnalyzer.ResolveAsync()
|
||||
↓
|
||||
EntryTraceGraph (nodes, edges, terminals)
|
||||
↓
|
||||
SemanticEntrypointOrchestrator.AnalyzeAsync()
|
||||
↓
|
||||
SemanticEntrypoint (intent, capabilities, threats)
|
||||
```
|
||||
|
||||
### SBOM Output
|
||||
|
||||
Semantic data appears in CycloneDX properties:
|
||||
|
||||
```json
|
||||
{
|
||||
"properties": [
|
||||
{ "name": "stellaops:semantic.intent", "value": "WebServer" },
|
||||
{ "name": "stellaops:semantic.capabilities", "value": "NetworkListen,DatabaseSql" },
|
||||
{ "name": "stellaops:semantic.threats", "value": "[{\"type\":\"SqlInjection\",\"confidence\":0.7}]" },
|
||||
{ "name": "stellaops:semantic.risk.score", "value": "0.7" },
|
||||
{ "name": "stellaops:semantic.framework", "value": "django" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### RichGraph Output
|
||||
|
||||
Semantic attributes on entrypoint nodes:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "entrypoint",
|
||||
"attributes": {
|
||||
"semantic_intent": "WebServer",
|
||||
"semantic_capabilities": "NetworkListen,DatabaseSql,UserInput",
|
||||
"semantic_threats": "SqlInjection,Xss",
|
||||
"semantic_risk_score": "0.7",
|
||||
"semantic_confidence": "0.85",
|
||||
"semantic_confidence_tier": "High"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### CLI Usage
|
||||
|
||||
```bash
|
||||
# Scan with semantic analysis
|
||||
stella scan myimage:latest --semantic
|
||||
|
||||
# Output includes semantic fields
|
||||
stella scan myimage:latest --format json | jq '.semantic'
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```csharp
|
||||
// Create orchestrator
|
||||
var orchestrator = new SemanticEntrypointOrchestrator();
|
||||
|
||||
// Create context from entry trace result
|
||||
var context = orchestrator.CreateContext(entryTraceResult, fileSystem, containerMetadata);
|
||||
|
||||
// Run analysis
|
||||
var result = await orchestrator.AnalyzeAsync(context);
|
||||
|
||||
if (result.Success && result.Entrypoint is not null)
|
||||
{
|
||||
Console.WriteLine($"Intent: {result.Entrypoint.Intent}");
|
||||
Console.WriteLine($"Capabilities: {result.Entrypoint.Capabilities}");
|
||||
Console.WriteLine($"Risk Score: {result.Entrypoint.AttackSurface.Max(t => t.Confidence)}");
|
||||
}
|
||||
```
|
||||
|
||||
## Extending the Engine
|
||||
|
||||
### Adding a New Language Adapter
|
||||
|
||||
1. Implement `ISemanticEntrypointAnalyzer`:
|
||||
|
||||
```csharp
|
||||
public sealed class RubySemanticAdapter : ISemanticEntrypointAnalyzer
|
||||
{
|
||||
public IReadOnlyList<string> SupportedLanguages => new[] { "ruby" };
|
||||
public int Priority => 100;
|
||||
|
||||
public ValueTask<SemanticEntrypoint> AnalyzeAsync(
|
||||
SemanticAnalysisContext context,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
// Detect Rails, Sinatra, Sidekiq, etc.
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. Register in `SemanticEntrypointOrchestrator.CreateDefaultAdapters()`.
|
||||
|
||||
### Adding a New Capability
|
||||
|
||||
1. Add to `CapabilityClass` flags enum
|
||||
2. Update `CapabilityDetector` with detection patterns
|
||||
3. Update `ThreatVectorInferrer` if capability contributes to threats
|
||||
4. Update `DataBoundaryMapper` if capability implies I/O boundaries
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Entry Trace Problem Statement](./entrypoint-problem.md)
|
||||
- [Static Analysis Approach](./entrypoint-static-analysis.md)
|
||||
- [Language-Specific Guides](./entrypoint-lang-python.md)
|
||||
- [Reachability Evidence](../../reachability/function-level-evidence.md)
|
||||
308
docs/modules/scanner/semantic-entrypoint-schema.md
Normal file
308
docs/modules/scanner/semantic-entrypoint-schema.md
Normal file
@@ -0,0 +1,308 @@
|
||||
# Semantic Entrypoint Schema
|
||||
|
||||
> Part of Sprint 0411 - Semantic Entrypoint Engine (Task 23)
|
||||
|
||||
This document defines the schema for semantic entrypoint analysis, which enriches container scan results with application-level intent, capabilities, and threat modeling.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Semantic Entrypoint Engine analyzes container entrypoints to infer:
|
||||
|
||||
1. **Application Intent** - What kind of application is running (web server, worker, CLI, etc.)
|
||||
2. **Capabilities** - What system resources the application accesses (network, filesystem, database, etc.)
|
||||
3. **Attack Surface** - Potential security threat vectors based on capabilities
|
||||
4. **Data Boundaries** - Data flow boundaries with sensitivity classification
|
||||
|
||||
This semantic layer enables more precise vulnerability prioritization by understanding which code paths are actually reachable from the entrypoint.
|
||||
|
||||
---
|
||||
|
||||
## Schema Definitions
|
||||
|
||||
### SemanticEntrypoint
|
||||
|
||||
The root type representing semantic analysis of an entrypoint.
|
||||
|
||||
```typescript
|
||||
interface SemanticEntrypoint {
|
||||
id: string; // Unique identifier for this analysis
|
||||
specification: EntrypointSpecification;
|
||||
intent: ApplicationIntent;
|
||||
capabilities: CapabilityClass; // Bitmask of detected capabilities
|
||||
attackSurface: ThreatVector[];
|
||||
dataBoundaries: DataFlowBoundary[];
|
||||
confidence: SemanticConfidence;
|
||||
language?: string; // Primary language (python, java, node, dotnet, go)
|
||||
framework?: string; // Detected framework (django, spring-boot, express, etc.)
|
||||
frameworkVersion?: string;
|
||||
runtimeVersion?: string;
|
||||
analyzedAt: string; // ISO-8601 timestamp
|
||||
}
|
||||
```
|
||||
|
||||
### ApplicationIntent
|
||||
|
||||
Enumeration of application types.
|
||||
|
||||
| Value | Description | Common Indicators |
|
||||
|-------|-------------|-------------------|
|
||||
| `Unknown` | Intent could not be determined | Fallback |
|
||||
| `WebServer` | HTTP/HTTPS server | Flask, Django, Express, ASP.NET Core, Gin |
|
||||
| `Worker` | Background job processor | Celery, Sidekiq, BackgroundService |
|
||||
| `CliTool` | Command-line interface | Click, argparse, Cobra, Picocli |
|
||||
| `Serverless` | FaaS function | Lambda handler, Cloud Functions |
|
||||
| `StreamProcessor` | Event stream handler | Kafka Streams, Flink |
|
||||
| `RpcServer` | RPC/gRPC server | gRPC, Thrift |
|
||||
| `Daemon` | Long-running service | Custom main loops |
|
||||
| `TestRunner` | Test execution | pytest, JUnit, xunit |
|
||||
| `BatchJob` | Scheduled/periodic task | Cron-style entry |
|
||||
| `Proxy` | Network proxy/gateway | Envoy, nginx config |
|
||||
|
||||
### CapabilityClass (Bitmask)
|
||||
|
||||
Flags indicating detected capabilities. Multiple flags can be combined.
|
||||
|
||||
| Flag | Value | Description |
|
||||
|------|-------|-------------|
|
||||
| `None` | 0x0 | No capabilities detected |
|
||||
| `NetworkListen` | 0x1 | Binds to network ports |
|
||||
| `NetworkOutbound` | 0x2 | Makes outbound network requests |
|
||||
| `FileRead` | 0x4 | Reads from filesystem |
|
||||
| `FileWrite` | 0x8 | Writes to filesystem |
|
||||
| `ProcessSpawn` | 0x10 | Spawns child processes |
|
||||
| `DatabaseSql` | 0x20 | SQL database access |
|
||||
| `DatabaseNoSql` | 0x40 | NoSQL database access |
|
||||
| `MessageQueue` | 0x80 | Message queue producer/consumer |
|
||||
| `CacheAccess` | 0x100 | Cache system access (Redis, Memcached) |
|
||||
| `CryptoSign` | 0x200 | Cryptographic signing operations |
|
||||
| `CryptoEncrypt` | 0x400 | Encryption/decryption operations |
|
||||
| `UserInput` | 0x800 | Processes user input |
|
||||
| `SecretAccess` | 0x1000 | Reads secrets/credentials |
|
||||
| `CloudSdk` | 0x2000 | Cloud provider SDK usage |
|
||||
| `ContainerApi` | 0x4000 | Container/orchestration API access |
|
||||
| `SystemCall` | 0x8000 | Direct syscall/FFI usage |
|
||||
|
||||
### ThreatVector
|
||||
|
||||
Represents a potential attack vector.
|
||||
|
||||
```typescript
|
||||
interface ThreatVector {
|
||||
type: ThreatVectorType;
|
||||
confidence: number; // 0.0 to 1.0
|
||||
contributingCapabilities: CapabilityClass;
|
||||
evidence: string[];
|
||||
cweId?: number; // CWE identifier
|
||||
owaspCategory?: string; // OWASP category
|
||||
}
|
||||
```
|
||||
|
||||
### ThreatVectorType
|
||||
|
||||
| Type | CWE | OWASP | Triggered By |
|
||||
|------|-----|-------|--------------|
|
||||
| `SqlInjection` | 89 | A03:Injection | DatabaseSql + UserInput |
|
||||
| `CommandInjection` | 78 | A03:Injection | ProcessSpawn + UserInput |
|
||||
| `PathTraversal` | 22 | A01:Broken Access Control | FileRead/FileWrite + UserInput |
|
||||
| `Ssrf` | 918 | A10:SSRF | NetworkOutbound + UserInput |
|
||||
| `Xss` | 79 | A03:Injection | NetworkListen + UserInput |
|
||||
| `InsecureDeserialization` | 502 | A08:Software and Data Integrity | UserInput + dynamic types |
|
||||
| `SensitiveDataExposure` | 200 | A02:Cryptographic Failures | SecretAccess + NetworkListen |
|
||||
| `BrokenAuthentication` | 287 | A07:Identification and Auth | NetworkListen + SecretAccess |
|
||||
| `InsufficientLogging` | 778 | A09:Logging Failures | NetworkListen without logging |
|
||||
| `CryptoWeakness` | 327 | A02:Cryptographic Failures | CryptoSign/CryptoEncrypt |
|
||||
|
||||
### DataFlowBoundary
|
||||
|
||||
Represents a data flow boundary crossing.
|
||||
|
||||
```typescript
|
||||
interface DataFlowBoundary {
|
||||
type: DataFlowBoundaryType;
|
||||
direction: DataFlowDirection; // Inbound | Outbound | Bidirectional
|
||||
sensitivity: DataSensitivity; // Public | Internal | Confidential | Restricted
|
||||
confidence: number;
|
||||
port?: number; // For network boundaries
|
||||
protocol?: string; // http, grpc, amqp, etc.
|
||||
evidence: string[];
|
||||
}
|
||||
```
|
||||
|
||||
### DataFlowBoundaryType
|
||||
|
||||
| Type | Security Sensitive | Description |
|
||||
|------|-------------------|-------------|
|
||||
| `HttpRequest` | Yes | HTTP/HTTPS endpoint |
|
||||
| `GrpcCall` | Yes | gRPC service |
|
||||
| `WebSocket` | Yes | WebSocket connection |
|
||||
| `DatabaseQuery` | Yes | Database queries |
|
||||
| `MessageBroker` | No | Message queue pub/sub |
|
||||
| `FileSystem` | No | File I/O boundary |
|
||||
| `Cache` | No | Cache read/write |
|
||||
| `ExternalApi` | Yes | Third-party API calls |
|
||||
| `CloudService` | Yes | Cloud provider services |
|
||||
|
||||
### SemanticConfidence
|
||||
|
||||
Confidence scoring for semantic analysis.
|
||||
|
||||
```typescript
|
||||
interface SemanticConfidence {
|
||||
score: number; // 0.0 to 1.0
|
||||
tier: ConfidenceTier;
|
||||
reasons: string[];
|
||||
}
|
||||
|
||||
enum ConfidenceTier {
|
||||
Unknown = 0,
|
||||
Low = 1,
|
||||
Medium = 2,
|
||||
High = 3,
|
||||
Definitive = 4
|
||||
}
|
||||
```
|
||||
|
||||
| Tier | Score Range | Description |
|
||||
|------|-------------|-------------|
|
||||
| `Unknown` | 0.0 | No analysis possible |
|
||||
| `Low` | 0.0-0.4 | Heuristic guess only |
|
||||
| `Medium` | 0.4-0.7 | Partial evidence |
|
||||
| `High` | 0.7-0.9 | Strong indicators |
|
||||
| `Definitive` | 0.9-1.0 | Explicit declaration found |
|
||||
|
||||
---
|
||||
|
||||
## SBOM Property Extensions
|
||||
|
||||
When semantic data is included in CycloneDX or SPDX SBOMs, the following property namespace is used:
|
||||
|
||||
```
|
||||
stellaops:semantic.*
|
||||
```
|
||||
|
||||
### Property Names
|
||||
|
||||
| Property | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| `stellaops:semantic.intent` | string | ApplicationIntent value |
|
||||
| `stellaops:semantic.capabilities` | string | Comma-separated capability names |
|
||||
| `stellaops:semantic.capability.count` | int | Number of detected capabilities |
|
||||
| `stellaops:semantic.threats` | JSON | Array of threat vector summaries |
|
||||
| `stellaops:semantic.threat.count` | int | Number of identified threats |
|
||||
| `stellaops:semantic.risk.score` | float | Overall risk score (0.0-1.0) |
|
||||
| `stellaops:semantic.confidence` | float | Confidence score (0.0-1.0) |
|
||||
| `stellaops:semantic.confidence.tier` | string | Confidence tier name |
|
||||
| `stellaops:semantic.language` | string | Primary language |
|
||||
| `stellaops:semantic.framework` | string | Detected framework |
|
||||
| `stellaops:semantic.framework.version` | string | Framework version |
|
||||
| `stellaops:semantic.boundary.count` | int | Number of data boundaries |
|
||||
| `stellaops:semantic.boundary.sensitive.count` | int | Security-sensitive boundaries |
|
||||
| `stellaops:semantic.owasp.categories` | string | Comma-separated OWASP categories |
|
||||
| `stellaops:semantic.cwe.ids` | string | Comma-separated CWE IDs |
|
||||
|
||||
---
|
||||
|
||||
## RichGraph Integration
|
||||
|
||||
Semantic data is attached to `richgraph-v1` nodes via the Attributes dictionary:
|
||||
|
||||
| Attribute Key | Description |
|
||||
|---------------|-------------|
|
||||
| `semantic_intent` | ApplicationIntent value |
|
||||
| `semantic_capabilities` | Comma-separated capability flags |
|
||||
| `semantic_threats` | Comma-separated threat types |
|
||||
| `semantic_risk_score` | Risk score (formatted to 3 decimal places) |
|
||||
| `semantic_confidence` | Confidence score |
|
||||
| `semantic_confidence_tier` | Confidence tier name |
|
||||
| `semantic_framework` | Framework name |
|
||||
| `semantic_framework_version` | Framework version |
|
||||
| `is_entrypoint` | "true" if node is an entrypoint |
|
||||
| `semantic_boundaries` | JSON array of boundary types |
|
||||
| `owasp_category` | OWASP category if applicable |
|
||||
| `cwe_id` | CWE identifier if applicable |
|
||||
|
||||
---
|
||||
|
||||
## Language Adapter Support
|
||||
|
||||
The following language-specific adapters are available:
|
||||
|
||||
| Language | Adapter | Supported Frameworks |
|
||||
|----------|---------|---------------------|
|
||||
| Python | `PythonSemanticAdapter` | Django, Flask, FastAPI, Celery, Click |
|
||||
| Java | `JavaSemanticAdapter` | Spring Boot, Quarkus, Micronaut, Kafka Streams |
|
||||
| Node.js | `NodeSemanticAdapter` | Express, NestJS, Fastify, Koa |
|
||||
| .NET | `DotNetSemanticAdapter` | ASP.NET Core, Worker Service, Console |
|
||||
| Go | `GoSemanticAdapter` | net/http, Gin, Echo, Cobra, gRPC |
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
Semantic analysis is configured via the `Scanner:EntryTrace:Semantic` configuration section:
|
||||
|
||||
```yaml
|
||||
Scanner:
|
||||
EntryTrace:
|
||||
Semantic:
|
||||
Enabled: true
|
||||
ThreatConfidenceThreshold: 0.3
|
||||
MaxThreatVectors: 50
|
||||
IncludeLowConfidenceCapabilities: false
|
||||
EnabledLanguages: [] # Empty = all languages
|
||||
```
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `Enabled` | true | Enable semantic analysis |
|
||||
| `ThreatConfidenceThreshold` | 0.3 | Minimum confidence for threat vectors |
|
||||
| `MaxThreatVectors` | 50 | Maximum threats per entrypoint |
|
||||
| `IncludeLowConfidenceCapabilities` | false | Include low-confidence capabilities |
|
||||
| `EnabledLanguages` | [] | Languages to analyze (empty = all) |
|
||||
|
||||
---
|
||||
|
||||
## Determinism Guarantees
|
||||
|
||||
All semantic analysis outputs are deterministic:
|
||||
|
||||
1. **Capability ordering** - Flags are ordered by value (bitmask position)
|
||||
2. **Threat vector ordering** - Ordered by ThreatVectorType enum value
|
||||
3. **Data boundary ordering** - Ordered by (Type, Direction) tuple
|
||||
4. **Evidence ordering** - Alphabetically sorted within each element
|
||||
5. **JSON serialization** - Uses camelCase naming, consistent formatting
|
||||
|
||||
This enables reliable diffing of semantic analysis results across scan runs.
|
||||
|
||||
---
|
||||
|
||||
## CLI Usage
|
||||
|
||||
Semantic analysis can be enabled via the CLI `--semantic` flag:
|
||||
|
||||
```bash
|
||||
stella scan --semantic docker.io/library/python:3.12
|
||||
```
|
||||
|
||||
Output includes semantic summary when enabled:
|
||||
|
||||
```
|
||||
Semantic Analysis:
|
||||
Intent: WebServer
|
||||
Framework: flask (v3.0.0)
|
||||
Capabilities: NetworkListen, DatabaseSql, FileRead
|
||||
Threat Vectors: 2 (SqlInjection, Ssrf)
|
||||
Risk Score: 0.72
|
||||
Confidence: High (0.85)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [OWASP Top 10 2021](https://owasp.org/Top10/)
|
||||
- [CWE/SANS Top 25](https://cwe.mitre.org/top25/)
|
||||
- [CycloneDX Property Extensions](https://cyclonedx.org/docs/1.5/json/#properties)
|
||||
- [SPDX 3.0 External Identifiers](https://spdx.github.io/spdx-spec/v3.0/annexes/external-identifier-types/)
|
||||
Reference in New Issue
Block a user