Files
git.stella-ops.org/docs/modules/scanner/analyzers-java.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

66 lines
3.6 KiB
Markdown

# Java Analyzer (Scanner)
## What it does
- Inventories Maven coordinates from JVM archives (JAR/WAR/EAR/fat JAR) without executing build tools.
- Prefers installed artifact metadata (`META-INF/maven/**/pom.properties`), with a `pom.xml` fallback when properties are missing.
- Enriches output with bounded embedded-library scan metadata and JNI usage hints.
## Inputs and precedence
1. **Installed archive inventory**: parse Maven coordinates from `META-INF/maven/**/pom.properties` in each discovered archive.
2. **`pom.xml` fallback**: when no `pom.properties` in the archive, parse `META-INF/maven/**/pom.xml` and emit a Maven PURL only when `groupId`, `artifactId`, and `version` are concrete (no placeholders like `${...}`).
3. **Lock augmentation (current)**: when a lock entry matches an installed artifact, merge lock metadata onto the component; unmatched lock entries still emit declared-only components.
4. **Multi-module lock precedence (pending)**: deterministic precedence rules are tracked in `SCAN-JAVA-403-003` (blocked).
5. **Runtime images (pending)**: runtime component identity is tracked in `SCAN-JAVA-403-004` (blocked).
## Embedded archives (fat JAR / WAR / EAR layouts)
The analyzer scans embedded library jars without extracting them to disk:
- `BOOT-INF/lib/*.jar`
- `WEB-INF/lib/*.jar`
- `APP-INF/lib/*.jar`
- `lib/*.jar`
### Locator format
Evidence locators are nested deterministically using `!` separators:
- `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`
### Bounds and skip markers
Embedded scanning is bounded and deterministic:
- Max embedded jars per archive: `256`
- Max embedded jar bytes: `25 MiB`
When embedded scanning is skipped or truncated, the outer component metadata includes deterministic markers:
- `embeddedScan.candidateJars`, `embeddedScan.scannedJars`, `embeddedScan.emittedComponents`
- `embeddedScanSkipped=true`, `embeddedScan.skippedJars`, `embeddedScanSkipReasons=<...>` (when applicable)
Embedded components include:
- `embedded=true`
- `embedded.containerJarPath=<outerRelativePath>`
- `embedded.entryPath=<embeddedEntryPath>`
## Evidence and hashing
- Evidence locators are project-relative, use `/` separators, and use `!` for nested artifact paths.
- `sha256` for `pom.properties` and `pom.xml` evidence is computed over the raw entry bytes.
## `pom.xml` with incomplete coordinates
When `pom.xml` is present but coordinates are incomplete (missing values or `${...}` placeholders), the analyzer emits an explicit-key component:
- `purl=null`, `version=null`
- `metadata.unresolvedCoordinates=true`
- `componentKey` follows the cross-analyzer explicit-key scheme via `LanguageExplicitKey.Create("java", "maven", ...)`
## JNI metadata (bytecode-based)
JNI hints are derived from parsed bytecode (native method flags and load call sites), not raw ASCII scanning.
When bytecode analysis finds JNI edges (`jni.edgeCount > 0`), components are annotated with bounded, deterministic metadata:
- `jni.edgeCount`, `jni.nativeMethodCount`, `jni.loadCallCount`, optional `jni.warningCount`
- `jni.reasons` (distinct reason codes)
- `jni.targetLibraries` (top-N stable sample; currently 12)
## Known limitations
- Shaded jars that strip Maven metadata remain best-effort; embedded libs without Maven metadata do not emit components.
- Gradle multi-module lock precedence and runtime image component identity remain blocked until explicit decisions land.
## References
- Sprint: `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md`
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/JavaLanguageAnalyzer.cs`