# Java Analyzer (Scanner)

## What it does
- Inventories Maven coordinates from JVM archives (JAR/WAR/EAR/fat JAR) without executing build tools.
- Prefers installed artifact metadata (`META-INF/maven/**/pom.properties`), with a `pom.xml` fallback when properties are missing.
- Enriches output with bounded embedded-library scan metadata and JNI usage hints.

## Inputs and precedence
1. **Installed archive inventory**: parse Maven coordinates from `META-INF/maven/**/pom.properties` in each discovered archive.
2. **`pom.xml` fallback**: when no `pom.properties` in the archive, parse `META-INF/maven/**/pom.xml` and emit a Maven PURL only when `groupId`, `artifactId`, and `version` are concrete (no placeholders like `${...}`).
3. **Lock augmentation (current)**: when a lock entry matches an installed artifact, merge lock metadata onto the component; unmatched lock entries still emit declared-only components.
4. **Multi-module lock precedence (pending)**: deterministic precedence rules are tracked in `SCAN-JAVA-403-003` (blocked).
5. **Runtime images (pending)**: runtime component identity is tracked in `SCAN-JAVA-403-004` (blocked).

## Embedded archives (fat JAR / WAR / EAR layouts)
The analyzer scans embedded library jars without extracting them to disk:
- `BOOT-INF/lib/*.jar`
- `WEB-INF/lib/*.jar`
- `APP-INF/lib/*.jar`
- `lib/*.jar`

### Locator format
Evidence locators are nested deterministically using `!` separators:
- `outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties`

### Bounds and skip markers
Embedded scanning is bounded and deterministic:
- Max embedded jars per archive: `256`
- Max embedded jar bytes: `25 MiB`

When embedded scanning is skipped or truncated, the outer component metadata includes deterministic markers:
- `embeddedScan.candidateJars`, `embeddedScan.scannedJars`, `embeddedScan.emittedComponents`
- `embeddedScanSkipped=true`, `embeddedScan.skippedJars`, `embeddedScanSkipReasons=<...>` (when applicable)

Embedded components include:
- `embedded=true`
- `embedded.containerJarPath=<outerRelativePath>`
- `embedded.entryPath=<embeddedEntryPath>`

## Evidence and hashing
- Evidence locators are project-relative, use `/` separators, and use `!` for nested artifact paths.
- `sha256` for `pom.properties` and `pom.xml` evidence is computed over the raw entry bytes.

## `pom.xml` with incomplete coordinates
When `pom.xml` is present but coordinates are incomplete (missing values or `${...}` placeholders), the analyzer emits an explicit-key component:
- `purl=null`, `version=null`
- `metadata.unresolvedCoordinates=true`
- `componentKey` follows the cross-analyzer explicit-key scheme via `LanguageExplicitKey.Create("java", "maven", ...)`

## JNI metadata (bytecode-based)
JNI hints are derived from parsed bytecode (native method flags and load call sites), not raw ASCII scanning.

When bytecode analysis finds JNI edges (`jni.edgeCount > 0`), components are annotated with bounded, deterministic metadata:
- `jni.edgeCount`, `jni.nativeMethodCount`, `jni.loadCallCount`, optional `jni.warningCount`
- `jni.reasons` (distinct reason codes)
- `jni.targetLibraries` (top-N stable sample; currently 12)

## Known limitations
- Shaded jars that strip Maven metadata remain best-effort; embedded libs without Maven metadata do not emit components.
- Gradle multi-module lock precedence and runtime image component identity remain blocked until explicit decisions land.

## References
- Sprint: `docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md`
- Cross-analyzer contract: `docs/modules/scanner/language-analyzers-contract.md`
- Implementation: `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/JavaLanguageAnalyzer.cs`