Files
git.stella-ops.org/docs/modules/scanner/operations/entrypoint-static-analysis.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

123 lines
5.3 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Entry-Point Static Analysis
This guide captures the static half of StellaOps entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
## 1) Loading OCI images
### 1.1 Supported inputs
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
### 1.2 Normalised model
```csharp
sealed class OciImage {
public required string Os;
public required string Arch;
public required string[] Entrypoint;
public required string[] Cmd;
public required string[] Shell; // Windows / powershell overrides
public required string WorkingDir;
public required string[] Env;
public required string[] ExposedPorts;
public required LabelMap Labels;
public required LayerRef[] Layers; // ordered, compressed blobs
}
```
Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
## 2) Overlay virtual filesystem
### 2.1 Whiteouts
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
### 2.2 Lazy extraction
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
### 2.3 Shell-form composition
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
- Always trust `config.json`; no need to inspect the Dockerfile.
- Working directory defaults to `/` if unspecified.
## 3) Low-level primitives
### 3.1 PATH resolution
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
- Verify execute bit (or Windows ACL) before accepting.
### 3.2 Shebang handling
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
### 3.3 Binary probes
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
## 4) Wrapper catalogue
Collapse known wrappers before analysing the target command:
- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
Rules:
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
- `gosu user cmd …` → collapse to `cmd …`.
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
## 5) ShellFlow integration
When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
- Parses POSIX sh (and common Bash extensions).
- Tracks environment mutations (`set`, `export`, `set --`).
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
## 6) Reduction algorithm
1. Compose argv `ENTRYPOINT ++ CMD`.
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
## 7) Confidence scoring
Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
| Id | Signal | Weight |
| --- | --- | --- |
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
| `f3` | VM host + resolvable artefact | +0.20 |
| `f4` | Exposed ports align with runtime | +0.06 |
| `f5` | Shebang interpreter matches runtime family | +0.05 |
| `f6` | Language artefact validation succeeded | +0.15 |
| `f8` | Multi-branch script unresolved (`$@` taint) | 0.20 |
| `f9` | Target missing execute bit | 0.25 |
| `f10` | Shell with no `exec` | 0.18 |
Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
## 8) Outputs
Return a populated `EntryTraceResult`:
- `Terminals` contains the best candidate(s) and their runtime classification.
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
- `Chain` shows the explainable path from initial Docker argv to the final binary.
Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.