3.8 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			3.8 KiB
		
	
	
	
	
	
	
	
ShellFlow — Script Reduction Playbook
Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.
1) Scope
- POSIX 
shsubset with common Bash extensions (control flow, functions, parameter expansion). - Handles idioms from official Docker images (
if [ "$1" = "server" ]; then …,exec gosu "$USER" "$@",set -- java -jar …). - Tracks positional parameters (
$@,$1..$9), environment variables, andset --mutations. - Produces one or more candidate commands with supporting evidence.
 
2) Architecture
ShellFlow/
  Parser/           // POSIX sh lexer + recursive descent parser
  Ast/              // nodes for lists, pipelines, conditionals, functions
  Evaluator/        // partial evaluation & taint tracking
  Idioms/           // pattern library for common Docker entrypoints
  Planner/          // emits CommandPlan[]
2.1 CommandPlan
public sealed record CommandPlan(
  string[] Argv,
  double   HeuristicScore,
  IReadOnlyList<string> Evidence,
  IReadOnlyList<ReductionEdge> Chain,
  bool     IsFallback = false
);
Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.
3) Parsing & AST
- Tokenise words, assignments, pipelines (
|), lists (;,&&,||), conditionals (if,case), loops (for,while,until), functions, and redirections. - Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
 - Record source spans to surface meaningful evidence (
"line 12: exec java -jar $APP_JAR"). 
4) Partial evaluation
- Initialise symbol table from image environment plus caller-supplied args.
 - Treat 
$@,$*,$1..$9as tainted; propagate taint through assignments. - Resolve 
${VAR:-default}and${VAR:+alt}whenVARknown; otherwise branch. - Support 
set -- …(resets positional parameters) andshift. source/.commands are parsed recursively when files are available; else fallback to low-confidence branch.
5) Exec sink detection
exec <cmd>dominates the remainder of the script.- Chains such as 
exec gosu "$USER" "$@"feed into wrapper collapse. - When no 
execis present, pick the last reachable simple command in the main path. - Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked 
IsFallback. 
6) Idiom library
| Pattern | Action | 
|---|---|
if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi | 
Rewrite argv to prepend default command. | 
if [ "$1" = "bash" ]; then exec "$@"; fi | 
Pass-through for manual shells. | 
exec "$@" + non-empty CMD | 
Substitute CMD vector into plan. | 
exec java -jar "$APP_JAR" "$@" | 
Resolve JAR via env or filesystem. | 
set -- gosu "$APP_USER" "$@" | 
Collapse into wrapper plan. | 
Idioms are implemented as AST visitors; each adds evidence strings when triggered.
7) Confidence scoring
- Base score from plan heuristics (
HeuristicScore). - Penalties for unresolved taint (
$@unknown), missing files, nested subshells, or fallbacks. - Bonus when idioms match, artefacts exist, or env values resolve cleanly.
 - Final confidence is combined with the outer static scoring model.
 
8) Failure modes
- Missing script (
ENTRYPOINTpoints to deleted file): emit fallback plan with low confidence. - Self-modifying scripts or heavy dynamic features (
eval, backticks): mark plan as low-confidence and surface warning evidence. - Commands that spawn supervisors without 
exec: return both the supervisor and inferred children (if configuration files are present). 
ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.