git.stella-ops.org/docs/modules/scanner/operations/entrypoint-static-analysis.md

# Entry-Point Static Analysis

This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.

## 1) Loading OCI images

### 1.1 Supported inputs
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).

### 1.2 Normalised model

```csharp
sealed class OciImage {
  public required string Os;
  public required string Arch;
  public required string[] Entrypoint;
  public required string[] Cmd;
  public required string[] Shell;      // Windows / powershell overrides
  public required string WorkingDir;
  public required string[] Env;
  public required string[] ExposedPorts;
  public required LabelMap Labels;
  public required LayerRef[] Layers;   // ordered, compressed blobs
}
```

Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).

## 2) Overlay virtual filesystem

### 2.1 Whiteouts
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.

### 2.2 Lazy extraction
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.

### 2.3 Shell-form composition
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
- Always trust `config.json`; no need to inspect the Dockerfile.
- Working directory defaults to `/` if unspecified.

## 3) Low-level primitives

### 3.1 PATH resolution
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
- Verify execute bit (or Windows ACL) before accepting.

### 3.2 Shebang handling
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.

### 3.3 Binary probes
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).

## 4) Wrapper catalogue

Collapse known wrappers before analysing the target command:

- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.

Rules:
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
- `gosu user cmd …` → collapse to `cmd …`.
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).

## 5) ShellFlow integration

When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:

- Parses POSIX sh (and common Bash extensions).
- Tracks environment mutations (`set`, `export`, `set --`).
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.

The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.

## 6) Reduction algorithm

1. Compose argv `ENTRYPOINT ++ CMD`.
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.

## 7) Confidence scoring

Use a simple logistic model with feature contributions captured for the evidence trail. Example features:

| Id | Signal | Weight |
| --- | --- | --- |
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
| `f3` | VM host + resolvable artefact | +0.20 |
| `f4` | Exposed ports align with runtime | +0.06 |
| `f5` | Shebang interpreter matches runtime family | +0.05 |
| `f6` | Language artefact validation succeeded | +0.15 |
| `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 |
| `f9` | Target missing execute bit | −0.25 |
| `f10` | Shell with no `exec` | −0.18 |

Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.

## 8) Outputs

Return a populated `EntryTraceResult`:

- `Terminals` contains the best candidate(s) and their runtime classification.
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
- `Chain` shows the explainable path from initial Docker argv to the final binary.

Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.