Align AOC tasks for Excititor and Concelier
This commit is contained in:
@@ -1,122 +1,122 @@
|
||||
# Entry-Point Static Analysis
|
||||
|
||||
This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
|
||||
|
||||
## 1) Loading OCI images
|
||||
|
||||
### 1.1 Supported inputs
|
||||
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
|
||||
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
|
||||
|
||||
### 1.2 Normalised model
|
||||
|
||||
```csharp
|
||||
sealed class OciImage {
|
||||
public required string Os;
|
||||
public required string Arch;
|
||||
public required string[] Entrypoint;
|
||||
public required string[] Cmd;
|
||||
public required string[] Shell; // Windows / powershell overrides
|
||||
public required string WorkingDir;
|
||||
public required string[] Env;
|
||||
public required string[] ExposedPorts;
|
||||
public required LabelMap Labels;
|
||||
public required LayerRef[] Layers; // ordered, compressed blobs
|
||||
}
|
||||
```
|
||||
|
||||
Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
|
||||
|
||||
## 2) Overlay virtual filesystem
|
||||
|
||||
### 2.1 Whiteouts
|
||||
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
|
||||
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
|
||||
|
||||
### 2.2 Lazy extraction
|
||||
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
|
||||
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
|
||||
|
||||
### 2.3 Shell-form composition
|
||||
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
|
||||
- Always trust `config.json`; no need to inspect the Dockerfile.
|
||||
- Working directory defaults to `/` if unspecified.
|
||||
|
||||
## 3) Low-level primitives
|
||||
|
||||
### 3.1 PATH resolution
|
||||
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
|
||||
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
|
||||
- Verify execute bit (or Windows ACL) before accepting.
|
||||
|
||||
### 3.2 Shebang handling
|
||||
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
|
||||
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
|
||||
|
||||
### 3.3 Binary probes
|
||||
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
|
||||
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
|
||||
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
|
||||
|
||||
## 4) Wrapper catalogue
|
||||
|
||||
Collapse known wrappers before analysing the target command:
|
||||
|
||||
- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
|
||||
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
|
||||
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
|
||||
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
|
||||
|
||||
Rules:
|
||||
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
|
||||
- `gosu user cmd …` → collapse to `cmd …`.
|
||||
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
|
||||
|
||||
## 5) ShellFlow integration
|
||||
|
||||
When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
|
||||
|
||||
- Parses POSIX sh (and common Bash extensions).
|
||||
- Tracks environment mutations (`set`, `export`, `set --`).
|
||||
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
|
||||
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
|
||||
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
|
||||
|
||||
The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
|
||||
|
||||
## 6) Reduction algorithm
|
||||
|
||||
1. Compose argv `ENTRYPOINT ++ CMD`.
|
||||
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
|
||||
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
|
||||
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
|
||||
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
|
||||
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
|
||||
|
||||
## 7) Confidence scoring
|
||||
|
||||
Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
|
||||
|
||||
| Id | Signal | Weight |
|
||||
| --- | --- | --- |
|
||||
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
|
||||
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
|
||||
| `f3` | VM host + resolvable artefact | +0.20 |
|
||||
| `f4` | Exposed ports align with runtime | +0.06 |
|
||||
| `f5` | Shebang interpreter matches runtime family | +0.05 |
|
||||
| `f6` | Language artefact validation succeeded | +0.15 |
|
||||
| `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 |
|
||||
| `f9` | Target missing execute bit | −0.25 |
|
||||
| `f10` | Shell with no `exec` | −0.18 |
|
||||
|
||||
Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
|
||||
|
||||
## 8) Outputs
|
||||
|
||||
Return a populated `EntryTraceResult`:
|
||||
|
||||
- `Terminals` contains the best candidate(s) and their runtime classification.
|
||||
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
|
||||
- `Chain` shows the explainable path from initial Docker argv to the final binary.
|
||||
|
||||
Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
|
||||
# Entry-Point Static Analysis
|
||||
|
||||
This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
|
||||
|
||||
## 1) Loading OCI images
|
||||
|
||||
### 1.1 Supported inputs
|
||||
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
|
||||
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
|
||||
|
||||
### 1.2 Normalised model
|
||||
|
||||
```csharp
|
||||
sealed class OciImage {
|
||||
public required string Os;
|
||||
public required string Arch;
|
||||
public required string[] Entrypoint;
|
||||
public required string[] Cmd;
|
||||
public required string[] Shell; // Windows / powershell overrides
|
||||
public required string WorkingDir;
|
||||
public required string[] Env;
|
||||
public required string[] ExposedPorts;
|
||||
public required LabelMap Labels;
|
||||
public required LayerRef[] Layers; // ordered, compressed blobs
|
||||
}
|
||||
```
|
||||
|
||||
Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
|
||||
|
||||
## 2) Overlay virtual filesystem
|
||||
|
||||
### 2.1 Whiteouts
|
||||
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
|
||||
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
|
||||
|
||||
### 2.2 Lazy extraction
|
||||
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
|
||||
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
|
||||
|
||||
### 2.3 Shell-form composition
|
||||
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
|
||||
- Always trust `config.json`; no need to inspect the Dockerfile.
|
||||
- Working directory defaults to `/` if unspecified.
|
||||
|
||||
## 3) Low-level primitives
|
||||
|
||||
### 3.1 PATH resolution
|
||||
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
|
||||
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
|
||||
- Verify execute bit (or Windows ACL) before accepting.
|
||||
|
||||
### 3.2 Shebang handling
|
||||
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
|
||||
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
|
||||
|
||||
### 3.3 Binary probes
|
||||
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
|
||||
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
|
||||
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
|
||||
|
||||
## 4) Wrapper catalogue
|
||||
|
||||
Collapse known wrappers before analysing the target command:
|
||||
|
||||
- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
|
||||
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
|
||||
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
|
||||
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
|
||||
|
||||
Rules:
|
||||
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
|
||||
- `gosu user cmd …` → collapse to `cmd …`.
|
||||
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
|
||||
|
||||
## 5) ShellFlow integration
|
||||
|
||||
When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
|
||||
|
||||
- Parses POSIX sh (and common Bash extensions).
|
||||
- Tracks environment mutations (`set`, `export`, `set --`).
|
||||
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
|
||||
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
|
||||
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
|
||||
|
||||
The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
|
||||
|
||||
## 6) Reduction algorithm
|
||||
|
||||
1. Compose argv `ENTRYPOINT ++ CMD`.
|
||||
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
|
||||
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
|
||||
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
|
||||
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
|
||||
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
|
||||
|
||||
## 7) Confidence scoring
|
||||
|
||||
Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
|
||||
|
||||
| Id | Signal | Weight |
|
||||
| --- | --- | --- |
|
||||
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
|
||||
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
|
||||
| `f3` | VM host + resolvable artefact | +0.20 |
|
||||
| `f4` | Exposed ports align with runtime | +0.06 |
|
||||
| `f5` | Shebang interpreter matches runtime family | +0.05 |
|
||||
| `f6` | Language artefact validation succeeded | +0.15 |
|
||||
| `f8` | Multi-branch script unresolved (`$@` taint) | −0.20 |
|
||||
| `f9` | Target missing execute bit | −0.25 |
|
||||
| `f10` | Shell with no `exec` | −0.18 |
|
||||
|
||||
Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
|
||||
|
||||
## 8) Outputs
|
||||
|
||||
Return a populated `EntryTraceResult`:
|
||||
|
||||
- `Terminals` contains the best candidate(s) and their runtime classification.
|
||||
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
|
||||
- `Chain` shows the explainable path from initial Docker argv to the final binary.
|
||||
|
||||
Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
|
||||
|
||||
Reference in New Issue
Block a user