- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
		
			
				
	
	
		
			123 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			123 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Entry-Point Static Analysis
 | 
						||
 | 
						||
This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
 | 
						||
 | 
						||
## 1) Loading OCI images
 | 
						||
 | 
						||
### 1.1 Supported inputs
 | 
						||
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
 | 
						||
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
 | 
						||
 | 
						||
### 1.2 Normalised model
 | 
						||
 | 
						||
```csharp
 | 
						||
sealed class OciImage {
 | 
						||
  public required string Os;
 | 
						||
  public required string Arch;
 | 
						||
  public required string[] Entrypoint;
 | 
						||
  public required string[] Cmd;
 | 
						||
  public required string[] Shell;      // Windows / powershell overrides
 | 
						||
  public required string WorkingDir;
 | 
						||
  public required string[] Env;
 | 
						||
  public required string[] ExposedPorts;
 | 
						||
  public required LabelMap Labels;
 | 
						||
  public required LayerRef[] Layers;   // ordered, compressed blobs
 | 
						||
}
 | 
						||
```
 | 
						||
 | 
						||
Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
 | 
						||
 | 
						||
## 2) Overlay virtual filesystem
 | 
						||
 | 
						||
### 2.1 Whiteouts
 | 
						||
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
 | 
						||
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
 | 
						||
 | 
						||
### 2.2 Lazy extraction
 | 
						||
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
 | 
						||
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
 | 
						||
 | 
						||
### 2.3 Shell-form composition
 | 
						||
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
 | 
						||
- Always trust `config.json`; no need to inspect the Dockerfile.
 | 
						||
- Working directory defaults to `/` if unspecified.
 | 
						||
 | 
						||
## 3) Low-level primitives
 | 
						||
 | 
						||
### 3.1 PATH resolution
 | 
						||
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
 | 
						||
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
 | 
						||
- Verify execute bit (or Windows ACL) before accepting.
 | 
						||
 | 
						||
### 3.2 Shebang handling
 | 
						||
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
 | 
						||
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
 | 
						||
 | 
						||
### 3.3 Binary probes
 | 
						||
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
 | 
						||
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
 | 
						||
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
 | 
						||
 | 
						||
## 4) Wrapper catalogue
 | 
						||
 | 
						||
Collapse known wrappers before analysing the target command:
 | 
						||
 | 
						||
- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
 | 
						||
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
 | 
						||
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
 | 
						||
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
 | 
						||
 | 
						||
Rules:
 | 
						||
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
 | 
						||
- `gosu user cmd …` → collapse to `cmd …`.
 | 
						||
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
 | 
						||
 | 
						||
## 5) ShellFlow integration
 | 
						||
 | 
						||
When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
 | 
						||
 | 
						||
- Parses POSIX sh (and common Bash extensions).
 | 
						||
- Tracks environment mutations (`set`, `export`, `set --`).
 | 
						||
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
 | 
						||
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
 | 
						||
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
 | 
						||
 | 
						||
The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
 | 
						||
 | 
						||
## 6) Reduction algorithm
 | 
						||
 | 
						||
1. Compose argv `ENTRYPOINT ++ CMD`.
 | 
						||
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
 | 
						||
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
 | 
						||
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
 | 
						||
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
 | 
						||
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
 | 
						||
 | 
						||
## 7) Confidence scoring
 | 
						||
 | 
						||
Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
 | 
						||
 | 
						||
| Id | Signal | Weight |
 | 
						||
| --- | --- | --- |
 | 
						||
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
 | 
						||
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
 | 
						||
| `f3` | VM host + resolvable artefact | +0.20 |
 | 
						||
| `f4` | Exposed ports align with runtime | +0.06 |
 | 
						||
| `f5` | Shebang interpreter matches runtime family | +0.05 |
 | 
						||
| `f6` | Language artefact validation succeeded | +0.15 |
 | 
						||
| `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 |
 | 
						||
| `f9` | Target missing execute bit | −0.25 |
 | 
						||
| `f10` | Shell with no `exec` | −0.18 |
 | 
						||
 | 
						||
Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
 | 
						||
 | 
						||
## 8) Outputs
 | 
						||
 | 
						||
Return a populated `EntryTraceResult`:
 | 
						||
 | 
						||
- `Terminals` contains the best candidate(s) and their runtime classification.
 | 
						||
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
 | 
						||
- `Chain` shows the explainable path from initial Docker argv to the final binary.
 | 
						||
 | 
						||
Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
 |