84 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			84 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# ShellFlow — Script Reduction Playbook
 | 
						|
 | 
						|
Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.
 | 
						|
 | 
						|
## 1) Scope
 | 
						|
 | 
						|
- POSIX `sh` subset with common Bash extensions (control flow, functions, parameter expansion).
 | 
						|
- Handles idioms from official Docker images (`if [ "$1" = "server" ]; then …`, `exec gosu "$USER" "$@"`, `set -- java -jar …`).
 | 
						|
- Tracks positional parameters (`$@`, `$1..$9`), environment variables, and `set --` mutations.
 | 
						|
- Produces one or more candidate commands with supporting evidence.
 | 
						|
 | 
						|
## 2) Architecture
 | 
						|
 | 
						|
```
 | 
						|
ShellFlow/
 | 
						|
  Parser/           // POSIX sh lexer + recursive descent parser
 | 
						|
  Ast/              // nodes for lists, pipelines, conditionals, functions
 | 
						|
  Evaluator/        // partial evaluation & taint tracking
 | 
						|
  Idioms/           // pattern library for common Docker entrypoints
 | 
						|
  Planner/          // emits CommandPlan[]
 | 
						|
```
 | 
						|
 | 
						|
### 2.1 CommandPlan
 | 
						|
 | 
						|
```csharp
 | 
						|
public sealed record CommandPlan(
 | 
						|
  string[] Argv,
 | 
						|
  double   HeuristicScore,
 | 
						|
  IReadOnlyList<string> Evidence,
 | 
						|
  IReadOnlyList<ReductionEdge> Chain,
 | 
						|
  bool     IsFallback = false
 | 
						|
);
 | 
						|
```
 | 
						|
 | 
						|
Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.
 | 
						|
 | 
						|
## 3) Parsing & AST
 | 
						|
 | 
						|
- Tokenise words, assignments, pipelines (`|`), lists (`;`, `&&`, `||`), conditionals (`if`, `case`), loops (`for`, `while`, `until`), functions, and redirections.
 | 
						|
- Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
 | 
						|
- Record source spans to surface meaningful evidence (`"line 12: exec java -jar $APP_JAR"`).
 | 
						|
 | 
						|
## 4) Partial evaluation
 | 
						|
 | 
						|
- Initialise symbol table from image environment plus caller-supplied args.
 | 
						|
- Treat `$@`, `$*`, `$1..$9` as tainted; propagate taint through assignments.
 | 
						|
- Resolve `${VAR:-default}` and `${VAR:+alt}` when `VAR` known; otherwise branch.
 | 
						|
- Support `set -- …` (resets positional parameters) and `shift`.
 | 
						|
- `source`/`.` commands are parsed recursively when files are available; else fallback to low-confidence branch.
 | 
						|
 | 
						|
## 5) Exec sink detection
 | 
						|
 | 
						|
- `exec <cmd>` dominates the remainder of the script.
 | 
						|
- Chains such as `exec gosu "$USER" "$@"` feed into wrapper collapse.
 | 
						|
- When no `exec` is present, pick the last reachable simple command in the main path.
 | 
						|
- Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked `IsFallback`.
 | 
						|
 | 
						|
## 6) Idiom library
 | 
						|
 | 
						|
| Pattern | Action |
 | 
						|
| --- | --- |
 | 
						|
| `if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi` | Rewrite argv to prepend default command. |
 | 
						|
| `if [ "$1" = "bash" ]; then exec "$@"; fi` | Pass-through for manual shells. |
 | 
						|
| `exec "$@"` + non-empty CMD | Substitute CMD vector into plan. |
 | 
						|
| `exec java -jar "$APP_JAR" "$@"` | Resolve JAR via env or filesystem. |
 | 
						|
| `set -- gosu "$APP_USER" "$@"` | Collapse into wrapper plan. |
 | 
						|
 | 
						|
Idioms are implemented as AST visitors; each adds evidence strings when triggered.
 | 
						|
 | 
						|
## 7) Confidence scoring
 | 
						|
 | 
						|
- Base score from plan heuristics (`HeuristicScore`).
 | 
						|
- Penalties for unresolved taint (`$@` unknown), missing files, nested subshells, or fallbacks.
 | 
						|
- Bonus when idioms match, artefacts exist, or env values resolve cleanly.
 | 
						|
- Final confidence is combined with the outer static scoring model.
 | 
						|
 | 
						|
## 8) Failure modes
 | 
						|
 | 
						|
- Missing script (`ENTRYPOINT` points to deleted file): emit fallback plan with low confidence.
 | 
						|
- Self-modifying scripts or heavy dynamic features (`eval`, backticks): mark plan as low-confidence and surface warning evidence.
 | 
						|
- Commands that spawn supervisors without `exec`: return both the supervisor and inferred children (if configuration files are present).
 | 
						|
 | 
						|
ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.
 |