- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
3.8 KiB
3.8 KiB
ShellFlow — Script Reduction Playbook
Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.
1) Scope
- POSIX
shsubset with common Bash extensions (control flow, functions, parameter expansion). - Handles idioms from official Docker images (
if [ "$1" = "server" ]; then …,exec gosu "$USER" "$@",set -- java -jar …). - Tracks positional parameters (
$@,$1..$9), environment variables, andset --mutations. - Produces one or more candidate commands with supporting evidence.
2) Architecture
ShellFlow/
Parser/ // POSIX sh lexer + recursive descent parser
Ast/ // nodes for lists, pipelines, conditionals, functions
Evaluator/ // partial evaluation & taint tracking
Idioms/ // pattern library for common Docker entrypoints
Planner/ // emits CommandPlan[]
2.1 CommandPlan
public sealed record CommandPlan(
string[] Argv,
double HeuristicScore,
IReadOnlyList<string> Evidence,
IReadOnlyList<ReductionEdge> Chain,
bool IsFallback = false
);
Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.
3) Parsing & AST
- Tokenise words, assignments, pipelines (
|), lists (;,&&,||), conditionals (if,case), loops (for,while,until), functions, and redirections. - Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
- Record source spans to surface meaningful evidence (
"line 12: exec java -jar $APP_JAR").
4) Partial evaluation
- Initialise symbol table from image environment plus caller-supplied args.
- Treat
$@,$*,$1..$9as tainted; propagate taint through assignments. - Resolve
${VAR:-default}and${VAR:+alt}whenVARknown; otherwise branch. - Support
set -- …(resets positional parameters) andshift. source/.commands are parsed recursively when files are available; else fallback to low-confidence branch.
5) Exec sink detection
exec <cmd>dominates the remainder of the script.- Chains such as
exec gosu "$USER" "$@"feed into wrapper collapse. - When no
execis present, pick the last reachable simple command in the main path. - Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked
IsFallback.
6) Idiom library
| Pattern | Action |
|---|---|
if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi |
Rewrite argv to prepend default command. |
if [ "$1" = "bash" ]; then exec "$@"; fi |
Pass-through for manual shells. |
exec "$@" + non-empty CMD |
Substitute CMD vector into plan. |
exec java -jar "$APP_JAR" "$@" |
Resolve JAR via env or filesystem. |
set -- gosu "$APP_USER" "$@" |
Collapse into wrapper plan. |
Idioms are implemented as AST visitors; each adds evidence strings when triggered.
7) Confidence scoring
- Base score from plan heuristics (
HeuristicScore). - Penalties for unresolved taint (
$@unknown), missing files, nested subshells, or fallbacks. - Bonus when idioms match, artefacts exist, or env values resolve cleanly.
- Final confidence is combined with the outer static scoring model.
8) Failure modes
- Missing script (
ENTRYPOINTpoints to deleted file): emit fallback plan with low confidence. - Self-modifying scripts or heavy dynamic features (
eval, backticks): mark plan as low-confidence and surface warning evidence. - Commands that spawn supervisors without
exec: return both the supervisor and inferred children (if configuration files are present).
ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.