Files
git.stella-ops.org/docs/modules/scanner/analyzers-java.md
StellaOps Bot 6e45066e37
Some checks failed
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
up
2025-12-13 09:37:15 +02:00

3.6 KiB

Java Analyzer (Scanner)

What it does

  • Inventories Maven coordinates from JVM archives (JAR/WAR/EAR/fat JAR) without executing build tools.
  • Prefers installed artifact metadata (META-INF/maven/**/pom.properties), with a pom.xml fallback when properties are missing.
  • Enriches output with bounded embedded-library scan metadata and JNI usage hints.

Inputs and precedence

  1. Installed archive inventory: parse Maven coordinates from META-INF/maven/**/pom.properties in each discovered archive.
  2. pom.xml fallback: when no pom.properties in the archive, parse META-INF/maven/**/pom.xml and emit a Maven PURL only when groupId, artifactId, and version are concrete (no placeholders like ${...}).
  3. Lock augmentation (current): when a lock entry matches an installed artifact, merge lock metadata onto the component; unmatched lock entries still emit declared-only components.
  4. Multi-module lock precedence (pending): deterministic precedence rules are tracked in SCAN-JAVA-403-003 (blocked).
  5. Runtime images (pending): runtime component identity is tracked in SCAN-JAVA-403-004 (blocked).

Embedded archives (fat JAR / WAR / EAR layouts)

The analyzer scans embedded library jars without extracting them to disk:

  • BOOT-INF/lib/*.jar
  • WEB-INF/lib/*.jar
  • APP-INF/lib/*.jar
  • lib/*.jar

Locator format

Evidence locators are nested deterministically using ! separators:

  • outer.jar!BOOT-INF/lib/inner.jar!META-INF/maven/.../pom.properties

Bounds and skip markers

Embedded scanning is bounded and deterministic:

  • Max embedded jars per archive: 256
  • Max embedded jar bytes: 25 MiB

When embedded scanning is skipped or truncated, the outer component metadata includes deterministic markers:

  • embeddedScan.candidateJars, embeddedScan.scannedJars, embeddedScan.emittedComponents
  • embeddedScanSkipped=true, embeddedScan.skippedJars, embeddedScanSkipReasons=<...> (when applicable)

Embedded components include:

  • embedded=true
  • embedded.containerJarPath=<outerRelativePath>
  • embedded.entryPath=<embeddedEntryPath>

Evidence and hashing

  • Evidence locators are project-relative, use / separators, and use ! for nested artifact paths.
  • sha256 for pom.properties and pom.xml evidence is computed over the raw entry bytes.

pom.xml with incomplete coordinates

When pom.xml is present but coordinates are incomplete (missing values or ${...} placeholders), the analyzer emits an explicit-key component:

  • purl=null, version=null
  • metadata.unresolvedCoordinates=true
  • componentKey follows the cross-analyzer explicit-key scheme via LanguageExplicitKey.Create("java", "maven", ...)

JNI metadata (bytecode-based)

JNI hints are derived from parsed bytecode (native method flags and load call sites), not raw ASCII scanning.

When bytecode analysis finds JNI edges (jni.edgeCount > 0), components are annotated with bounded, deterministic metadata:

  • jni.edgeCount, jni.nativeMethodCount, jni.loadCallCount, optional jni.warningCount
  • jni.reasons (distinct reason codes)
  • jni.targetLibraries (top-N stable sample; currently 12)

Known limitations

  • Shaded jars that strip Maven metadata remain best-effort; embedded libs without Maven metadata do not emit components.
  • Gradle multi-module lock precedence and runtime image component identity remain blocked until explicit decisions land.

References

  • Sprint: docs/implplan/SPRINT_0403_0001_0001_scanner_java_detection_gaps.md
  • Cross-analyzer contract: docs/modules/scanner/language-analyzers-contract.md
  • Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/JavaLanguageAnalyzer.cs