feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
		
							
								
								
									
										122
									
								
								docs/modules/scanner/operations/entrypoint-static-analysis.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										122
									
								
								docs/modules/scanner/operations/entrypoint-static-analysis.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,122 @@ | ||||
| # Entry-Point Static Analysis | ||||
|  | ||||
| This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score. | ||||
|  | ||||
| ## 1) Loading OCI images | ||||
|  | ||||
| ### 1.1 Supported inputs | ||||
| - Registry references (`repo:tag@sha256:digest`) using the existing content store. | ||||
| - Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`). | ||||
|  | ||||
| ### 1.2 Normalised model | ||||
|  | ||||
| ```csharp | ||||
| sealed class OciImage { | ||||
|   public required string Os; | ||||
|   public required string Arch; | ||||
|   public required string[] Entrypoint; | ||||
|   public required string[] Cmd; | ||||
|   public required string[] Shell;      // Windows / powershell overrides | ||||
|   public required string WorkingDir; | ||||
|   public required string[] Env; | ||||
|   public required string[] ExposedPorts; | ||||
|   public required LabelMap Labels; | ||||
|   public required LayerRef[] Layers;   // ordered, compressed blobs | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3). | ||||
|  | ||||
| ## 2) Overlay virtual filesystem | ||||
|  | ||||
| ### 2.1 Whiteouts | ||||
| - Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers. | ||||
| - Opaque directory: `path/.wh..wh..opq` hides the directory entirely. | ||||
|  | ||||
| ### 2.2 Lazy extraction | ||||
| - First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`. | ||||
| - Decompress only when reading a file; optionally support eStargz TOC to accelerate random access. | ||||
|  | ||||
| ### 2.3 Shell-form composition | ||||
| - Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows). | ||||
| - Always trust `config.json`; no need to inspect the Dockerfile. | ||||
| - Working directory defaults to `/` if unspecified. | ||||
|  | ||||
| ## 3) Low-level primitives | ||||
|  | ||||
| ### 3.1 PATH resolution | ||||
| - Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`). | ||||
| - If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file. | ||||
| - Verify execute bit (or Windows ACL) before accepting. | ||||
|  | ||||
| ### 3.2 Shebang handling | ||||
| - For non-ELF/PE files: read first line; interpret `#!interpreter args`. | ||||
| - Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics. | ||||
|  | ||||
| ### 3.3 Binary probes | ||||
| - Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer. | ||||
| - Identify PE (Windows) and detect .NET single-file bundles via CLI header. | ||||
| - Record features for runtime scoring (Go vs Rust vs glibc vs musl). | ||||
|  | ||||
| ## 4) Wrapper catalogue | ||||
|  | ||||
| Collapse known wrappers before analysing the target command: | ||||
|  | ||||
| - Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`. | ||||
| - Privilege droppers: `gosu`, `su-exec`, `chpst`. | ||||
| - Shells: `sh`, `bash`, `dash`, BusyBox variants. | ||||
| - Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`. | ||||
|  | ||||
| Rules: | ||||
| - If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge. | ||||
| - `gosu user cmd …` → collapse to `cmd …`. | ||||
| - For shell wrappers, delegate to the ShellFlow analyser (see separate guide). | ||||
|  | ||||
| ## 5) ShellFlow integration | ||||
|  | ||||
| When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities: | ||||
|  | ||||
| - Parses POSIX sh (and common Bash extensions). | ||||
| - Tracks environment mutations (`set`, `export`, `set --`). | ||||
| - Resolves `$@`, `$1..9`, `${VAR:-default}`. | ||||
| - Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`). | ||||
| - Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence. | ||||
|  | ||||
| The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine. | ||||
|  | ||||
| ## 6) Reduction algorithm | ||||
|  | ||||
| 1. Compose argv `ENTRYPOINT ++ CMD`. | ||||
| 2. Collapse wrappers; append `ReductionEdge` entries documenting each step. | ||||
| 3. Resolve argv0 to an absolute file and classify (ELF/PE/script). | ||||
| 4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence. | ||||
| 5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.). | ||||
| 6. Emit `EntryTraceResult` with candidate terminals ranked by confidence. | ||||
|  | ||||
| ## 7) Confidence scoring | ||||
|  | ||||
| Use a simple logistic model with feature contributions captured for the evidence trail. Example features: | ||||
|  | ||||
| | Id | Signal | Weight | | ||||
| | --- | --- | --- | | ||||
| | `f1` | Entrypoint already an executable (ELF/PE) | +0.18 | | ||||
| | `f2` | Observed chain ends in non-wrapper binary | +0.22 | | ||||
| | `f3` | VM host + resolvable artefact | +0.20 | | ||||
| | `f4` | Exposed ports align with runtime | +0.06 | | ||||
| | `f5` | Shebang interpreter matches runtime family | +0.05 | | ||||
| | `f6` | Language artefact validation succeeded | +0.15 | | ||||
| | `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 | | ||||
| | `f9` | Target missing execute bit | −0.25 | | ||||
| | `f10` | Shell with no `exec` | −0.18 | | ||||
|  | ||||
| Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point. | ||||
|  | ||||
| ## 8) Outputs | ||||
|  | ||||
| Return a populated `EntryTraceResult`: | ||||
|  | ||||
| - `Terminals` contains the best candidate(s) and their runtime classification. | ||||
| - `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints. | ||||
| - `Chain` shows the explainable path from initial Docker argv to the final binary. | ||||
|  | ||||
| Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode. | ||||
		Reference in New Issue
	
	Block a user