# ShellFlow — Script Reduction Playbook Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions. ## 1) Scope - POSIX `sh` subset with common Bash extensions (control flow, functions, parameter expansion). - Handles idioms from official Docker images (`if [ "$1" = "server" ]; then …`, `exec gosu "$USER" "$@"`, `set -- java -jar …`). - Tracks positional parameters (`$@`, `$1..$9`), environment variables, and `set --` mutations. - Produces one or more candidate commands with supporting evidence. ## 2) Architecture ``` ShellFlow/ Parser/ // POSIX sh lexer + recursive descent parser Ast/ // nodes for lists, pipelines, conditionals, functions Evaluator/ // partial evaluation & taint tracking Idioms/ // pattern library for common Docker entrypoints Planner/ // emits CommandPlan[] ``` ### 2.1 CommandPlan ```csharp public sealed record CommandPlan( string[] Argv, double HeuristicScore, IReadOnlyList Evidence, IReadOnlyList Chain, bool IsFallback = false ); ``` Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence. ## 3) Parsing & AST - Tokenise words, assignments, pipelines (`|`), lists (`;`, `&&`, `||`), conditionals (`if`, `case`), loops (`for`, `while`, `until`), functions, and redirections. - Preserve heredocs and subshells as opaque nodes (evaluated conservatively). - Record source spans to surface meaningful evidence (`"line 12: exec java -jar $APP_JAR"`). ## 4) Partial evaluation - Initialise symbol table from image environment plus caller-supplied args. - Treat `$@`, `$*`, `$1..$9` as tainted; propagate taint through assignments. - Resolve `${VAR:-default}` and `${VAR:+alt}` when `VAR` known; otherwise branch. - Support `set -- …` (resets positional parameters) and `shift`. - `source`/`.` commands are parsed recursively when files are available; else fallback to low-confidence branch. ## 5) Exec sink detection - `exec ` dominates the remainder of the script. - Chains such as `exec gosu "$USER" "$@"` feed into wrapper collapse. - When no `exec` is present, pick the last reachable simple command in the main path. - Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked `IsFallback`. ## 6) Idiom library | Pattern | Action | | --- | --- | | `if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi` | Rewrite argv to prepend default command. | | `if [ "$1" = "bash" ]; then exec "$@"; fi` | Pass-through for manual shells. | | `exec "$@"` + non-empty CMD | Substitute CMD vector into plan. | | `exec java -jar "$APP_JAR" "$@"` | Resolve JAR via env or filesystem. | | `set -- gosu "$APP_USER" "$@"` | Collapse into wrapper plan. | Idioms are implemented as AST visitors; each adds evidence strings when triggered. ## 7) Confidence scoring - Base score from plan heuristics (`HeuristicScore`). - Penalties for unresolved taint (`$@` unknown), missing files, nested subshells, or fallbacks. - Bonus when idioms match, artefacts exist, or env values resolve cleanly. - Final confidence is combined with the outer static scoring model. ## 8) Failure modes - Missing script (`ENTRYPOINT` points to deleted file): emit fallback plan with low confidence. - Self-modifying scripts or heavy dynamic features (`eval`, backticks): mark plan as low-confidence and surface warning evidence. - Commands that spawn supervisors without `exec`: return both the supervisor and inferred children (if configuration files are present). ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.