Files
git.stella-ops.org/docs/modules/scanner/analyzers-go.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

116 lines
6.4 KiB
Markdown

# Go Analyzer (Scanner)
## What it does
- Inventories Go components from **binaries** (embedded buildinfo) and **source** (go.mod/go.sum/go.work/vendor) without executing `go`.
- Emits `pkg:golang/<module>@<version>` when a concrete version is available; otherwise emits deterministic explicit-key components (no "range-as-version" PURLs).
- Records VCS/build metadata and bounded evidence for audit/replay; remains offline-first.
- Detects security-relevant capabilities in Go source code (exec, filesystem, network, native code, etc.).
## Inputs and precedence
The analyzer processes inputs in the following order, with binary evidence taking precedence:
1. **Binary inventory (Phase 1, authoritative)**: Extract embedded build info (`runtime/debug` buildinfo blob) and emit Go modules (main + deps) with concrete versions and build settings evidence. Binary-derived components include `provenance=binary` metadata.
2. **Source inventory (Phase 2, supplementary)**: Parse `go.mod`, `go.sum`, `go.work`, and `vendor/modules.txt` to emit modules not already covered by binary evidence. Source-derived components include `provenance=source` metadata.
3. **Heuristic fallback (stripped binaries)**: When buildinfo is missing, emit deterministic `bin` components keyed by sha256 plus minimal classification evidence.
**Precedence rules:**
- Binary evidence is scanned first and takes precedence over source evidence.
- When both source and binary evidence exist for the same module path@version, only the binary-derived component is emitted.
- Main modules are tracked separately: if a binary emits `module@version`, source `module@(devel)` is suppressed.
- This ensures deterministic, non-duplicative output.
## Project discovery (modules + workspaces)
- Standalone modules are discovered by locating `go.mod` files (bounded recursion depth 10; vendor directories skipped).
- Workspaces are discovered via `go.work` at the analysis root; `use` members become additional module roots.
- Vendored dependencies are detected via `vendor/modules.txt` when present.
## Workspace replace directive propagation
`go.work` files may contain `replace` directives that apply to all workspace members:
- Workspace-level replaces are inherited by all member modules.
- Module-level replaces take precedence over workspace-level replaces for the same module path.
- Duplicate replace keys are handled deterministically (last-one-wins within each scope).
## Identity rules (PURL vs explicit key)
Concrete versions emit a PURL:
- `purl = pkg:golang/<modulePath>@<version>`
Non-concrete identities emit an explicit key:
- Used for source-only main modules (`(devel)`) and for any non-versioned module identity.
- PURL is omitted (`purl=null`) and the component is keyed deterministically via `AddFromExplicitKey`.
## Evidence and metadata
### Binary-derived components
Binary components include (when present):
- `provenance=binary`
- `go.version`
- `modulePath.main` and `build.*` settings
- VCS fields (`build.vcs*` from build settings and/or `go.dwarf` tokens)
- `moduleSum` and replacement metadata when available
- CGO signals (`cgo.enabled`, flags, compiler hints; plus adjacent native libs when detected)
### Source-derived components
Source components include:
- `provenance=source`
- `moduleSum` from `go.sum` (when present)
- vendor signals (`vendored=true`) and `vendor` evidence locators
- replacement/exclude flags with stable metadata keys
- best-effort license signals for main module and vendored modules
- `capabilities` metadata listing detected capability kinds (exec, filesystem, network, etc.)
- `capabilities.maxRisk` indicating highest risk level (critical/high/medium/low)
### Heuristic fallback components
Fallback components include:
- `type=bin`, deterministic `sha256` identity, and a classification evidence marker
- Metric `scanner_analyzer_golang_heuristic_total{indicator,version_hint}` increments per heuristic emission
## Capability scanning
The analyzer detects security-relevant capabilities in Go source code:
| Capability | Risk | Examples |
|------------|------|----------|
| Exec | Critical | `exec.Command`, `syscall.Exec`, `os.StartProcess` |
| NativeCode | Critical | `unsafe.Pointer`, `//go:linkname`, `syscall.Syscall` |
| PluginLoading | Critical | `plugin.Open` |
| Filesystem | High/Medium | `os.Remove`, `os.Chmod`, `os.WriteFile` |
| Network | Medium | `net.Dial`, `http.Get`, `http.ListenAndServe` |
| Environment | High/Medium | `os.Setenv`, `os.Getenv` |
| Database | Medium | `sql.Open`, `db.Query` |
| DynamicCode | High | `reflect.Value.Call`, `template.Execute` |
| Serialization | Medium | `gob.NewDecoder`, `xml.Unmarshal` |
| Reflection | Low/Medium | `reflect.TypeOf`, `reflect.New` |
| Crypto | Low | Hash functions, cipher operations |
Capabilities are emitted as:
- Metadata: `capabilities=exec,filesystem,network` (comma-separated list of kinds)
- Metadata: `capabilities.maxRisk=critical|high|medium|low`
- Evidence: Top 10 capability locations with pattern and line number
## IO/Memory bounds
Binary and DWARF scanning uses bounded windowed reads to limit memory usage:
- **Build info scanning**: 16 MB windows with 4 KB overlap; max file size 128 MB.
- **DWARF token scanning**: 8 MB windows with 1 KB overlap; max file size 256 MB.
- Small files (below window size) are read directly for efficiency.
## Retract semantics
Go's `retract` directive only applies to versions of the declaring module itself, not to dependencies:
- The `RetractedVersions` field in inventory results contains only versions of the main module that are retracted.
- Dependency retraction cannot be determined offline (would require fetching each module's go.mod).
- No false-positive retraction warnings are emitted for dependencies.
## Cache key correctness
Binary build info is cached using a composite key:
- File path (normalized for OS case sensitivity)
- File length
- Last modification time
- 4 KB header hash (FNV-1a)
The header hash ensures correct behavior in containerized/layered filesystem environments where files may have identical metadata but different content.
## References
- Sprint: `docs/implplan/SPRINT_0402_0001_0001_scanner_go_analyzer_gaps.md`
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/GoLanguageAnalyzer.cs`
- Capability scanner: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Go/Internal/GoCapabilityScanner.cs`