Files
git.stella-ops.org/docs/modules/scanner/operations/entrypoint-static-analysis.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

5.3 KiB
Raw Blame History

Entry-Point Static Analysis

This guide captures the static half of StellaOps entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.

1) Loading OCI images

1.1 Supported inputs

  • Registry references (repo:tag@sha256:digest) using the existing content store.
  • Local OCI/Docker v2 archives (docker save tarball, OCI layout directory with index.json + blobs/sha256/*).

1.2 Normalised model

sealed class OciImage {
  public required string Os;
  public required string Arch;
  public required string[] Entrypoint;
  public required string[] Cmd;
  public required string[] Shell;      // Windows / powershell overrides
  public required string WorkingDir;
  public required string[] Env;
  public required string[] ExposedPorts;
  public required LabelMap Labels;
  public required LayerRef[] Layers;   // ordered, compressed blobs
}

Compose the runtime argv as Entrypoint ++ Cmd, honouring shell-form vs exec-form (see §2.3).

2) Overlay virtual filesystem

2.1 Whiteouts

  • Regular whiteout: path/.wh.<name> removes <name> from lower layers.
  • Opaque directory: path/.wh..wh..opq hides the directory entirely.

2.2 Lazy extraction

  • First pass: build a tar index (path → layer, offset, size, mode, isWhiteout, isDir).
  • Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.

2.3 Shell-form composition

  • Dockerfile shell form is serialised as ["/bin/sh","-c","…"] (or Shell[] override on Windows).
  • Always trust config.json; no need to inspect the Dockerfile.
  • Working directory defaults to / if unspecified.

3) Low-level primitives

3.1 PATH resolution

  • Extract PATH from environment (fallback /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin).
  • If argv[0] is relative or lacks /, walk the PATH to resolve an absolute file.
  • Verify execute bit (or Windows ACL) before accepting.

3.2 Shebang handling

  • For non-ELF/PE files: read first line; interpret #!interpreter args.
  • Replace argv[0] with the interpreter, prepend shebang args, append script path per kernel semantics.

3.3 Binary probes

  • Identify ELF via magic \x7FELF, parse .interp, .dynamic, linked libs, .note.go.buildid, DWARF producer.
  • Identify PE (Windows) and detect .NET single-file bundles via CLI header.
  • Record features for runtime scoring (Go vs Rust vs glibc vs musl).

4) Wrapper catalogue

Collapse known wrappers before analysing the target command:

  • Init shims: tini, dumb-init, s6-svscan, runit, supervisord.
  • Privilege droppers: gosu, su-exec, chpst.
  • Shells: sh, bash, dash, BusyBox variants.
  • Package runners: npm, yarn, pnpm, pip, pipenv, poetry, bundle, rake.

Rules:

  • If wrapper contains a -- sentinel (tini -- app …) drop the wrapper and record a reduction edge.
  • gosu user cmd … → collapse to cmd ….
  • For shell wrappers, delegate to the ShellFlow analyser (see separate guide).

5) ShellFlow integration

When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual exec target. Key capabilities:

  • Parses POSIX sh (and common Bash extensions).
  • Tracks environment mutations (set, export, set --).
  • Resolves $@, $1..9, ${VAR:-default}.
  • Recognises idioms from official Docker images (if [ "$1" = "server" ]; then …).
  • Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.

The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.

6) Reduction algorithm

  1. Compose argv ENTRYPOINT ++ CMD.
  2. Collapse wrappers; append ReductionEdge entries documenting each step.
  3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
  4. If script → run ShellFlow; replace current command with highest-confidence exec target while preserving alternates as evidence.
  5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
  6. Emit EntryTraceResult with candidate terminals ranked by confidence.

7) Confidence scoring

Use a simple logistic model with feature contributions captured for the evidence trail. Example features:

Id Signal Weight
f1 Entrypoint already an executable (ELF/PE) +0.18
f2 Observed chain ends in non-wrapper binary +0.22
f3 VM host + resolvable artefact +0.20
f4 Exposed ports align with runtime +0.06
f5 Shebang interpreter matches runtime family +0.05
f6 Language artefact validation succeeded +0.15
f8 Multi-branch script unresolved ($@ taint) 0.20
f9 Target missing execute bit 0.25
f10 Shell with no exec 0.18

Persist per-feature evidence strings so UI/CLI users can see why the scanner picked a given entry point.

8) Outputs

Return a populated EntryTraceResult:

  • Terminals contains the best candidate(s) and their runtime classification.
  • Evidence aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
  • Chain shows the explainable path from initial Docker argv to the final binary.

Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.