- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
		
			
				
	
	
		
			123 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			123 lines
		
	
	
		
			5.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Entry-Point Static Analysis
 | ||
| 
 | ||
| This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
 | ||
| 
 | ||
| ## 1) Loading OCI images
 | ||
| 
 | ||
| ### 1.1 Supported inputs
 | ||
| - Registry references (`repo:tag@sha256:digest`) using the existing content store.
 | ||
| - Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
 | ||
| 
 | ||
| ### 1.2 Normalised model
 | ||
| 
 | ||
| ```csharp
 | ||
| sealed class OciImage {
 | ||
|   public required string Os;
 | ||
|   public required string Arch;
 | ||
|   public required string[] Entrypoint;
 | ||
|   public required string[] Cmd;
 | ||
|   public required string[] Shell;      // Windows / powershell overrides
 | ||
|   public required string WorkingDir;
 | ||
|   public required string[] Env;
 | ||
|   public required string[] ExposedPorts;
 | ||
|   public required LabelMap Labels;
 | ||
|   public required LayerRef[] Layers;   // ordered, compressed blobs
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
 | ||
| 
 | ||
| ## 2) Overlay virtual filesystem
 | ||
| 
 | ||
| ### 2.1 Whiteouts
 | ||
| - Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
 | ||
| - Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
 | ||
| 
 | ||
| ### 2.2 Lazy extraction
 | ||
| - First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
 | ||
| - Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
 | ||
| 
 | ||
| ### 2.3 Shell-form composition
 | ||
| - Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
 | ||
| - Always trust `config.json`; no need to inspect the Dockerfile.
 | ||
| - Working directory defaults to `/` if unspecified.
 | ||
| 
 | ||
| ## 3) Low-level primitives
 | ||
| 
 | ||
| ### 3.1 PATH resolution
 | ||
| - Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
 | ||
| - If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
 | ||
| - Verify execute bit (or Windows ACL) before accepting.
 | ||
| 
 | ||
| ### 3.2 Shebang handling
 | ||
| - For non-ELF/PE files: read first line; interpret `#!interpreter args`.
 | ||
| - Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
 | ||
| 
 | ||
| ### 3.3 Binary probes
 | ||
| - Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
 | ||
| - Identify PE (Windows) and detect .NET single-file bundles via CLI header.
 | ||
| - Record features for runtime scoring (Go vs Rust vs glibc vs musl).
 | ||
| 
 | ||
| ## 4) Wrapper catalogue
 | ||
| 
 | ||
| Collapse known wrappers before analysing the target command:
 | ||
| 
 | ||
| - Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
 | ||
| - Privilege droppers: `gosu`, `su-exec`, `chpst`.
 | ||
| - Shells: `sh`, `bash`, `dash`, BusyBox variants.
 | ||
| - Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
 | ||
| 
 | ||
| Rules:
 | ||
| - If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
 | ||
| - `gosu user cmd …` → collapse to `cmd …`.
 | ||
| - For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
 | ||
| 
 | ||
| ## 5) ShellFlow integration
 | ||
| 
 | ||
| When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
 | ||
| 
 | ||
| - Parses POSIX sh (and common Bash extensions).
 | ||
| - Tracks environment mutations (`set`, `export`, `set --`).
 | ||
| - Resolves `$@`, `$1..9`, `${VAR:-default}`.
 | ||
| - Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
 | ||
| - Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
 | ||
| 
 | ||
| The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
 | ||
| 
 | ||
| ## 6) Reduction algorithm
 | ||
| 
 | ||
| 1. Compose argv `ENTRYPOINT ++ CMD`.
 | ||
| 2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
 | ||
| 3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
 | ||
| 4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
 | ||
| 5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
 | ||
| 6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
 | ||
| 
 | ||
| ## 7) Confidence scoring
 | ||
| 
 | ||
| Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
 | ||
| 
 | ||
| | Id | Signal | Weight |
 | ||
| | --- | --- | --- |
 | ||
| | `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
 | ||
| | `f2` | Observed chain ends in non-wrapper binary | +0.22 |
 | ||
| | `f3` | VM host + resolvable artefact | +0.20 |
 | ||
| | `f4` | Exposed ports align with runtime | +0.06 |
 | ||
| | `f5` | Shebang interpreter matches runtime family | +0.05 |
 | ||
| | `f6` | Language artefact validation succeeded | +0.15 |
 | ||
| | `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 |
 | ||
| | `f9` | Target missing execute bit | −0.25 |
 | ||
| | `f10` | Shell with no `exec` | −0.18 |
 | ||
| 
 | ||
| Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
 | ||
| 
 | ||
| ## 8) Outputs
 | ||
| 
 | ||
| Return a populated `EntryTraceResult`:
 | ||
| 
 | ||
| - `Terminals` contains the best candidate(s) and their runtime classification.
 | ||
| - `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
 | ||
| - `Chain` shows the explainable path from initial Docker argv to the final binary.
 | ||
| 
 | ||
| Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
 |