Files
git.stella-ops.org/docs/modules/scanner/operations/entrypoint-shell-analysis.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

3.8 KiB

ShellFlow — Script Reduction Playbook

Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.

1) Scope

  • POSIX sh subset with common Bash extensions (control flow, functions, parameter expansion).
  • Handles idioms from official Docker images (if [ "$1" = "server" ]; then …, exec gosu "$USER" "$@", set -- java -jar …).
  • Tracks positional parameters ($@, $1..$9), environment variables, and set -- mutations.
  • Produces one or more candidate commands with supporting evidence.

2) Architecture

ShellFlow/
  Parser/           // POSIX sh lexer + recursive descent parser
  Ast/              // nodes for lists, pipelines, conditionals, functions
  Evaluator/        // partial evaluation & taint tracking
  Idioms/           // pattern library for common Docker entrypoints
  Planner/          // emits CommandPlan[]

2.1 CommandPlan

public sealed record CommandPlan(
  string[] Argv,
  double   HeuristicScore,
  IReadOnlyList<string> Evidence,
  IReadOnlyList<ReductionEdge> Chain,
  bool     IsFallback = false
);

Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.

3) Parsing & AST

  • Tokenise words, assignments, pipelines (|), lists (;, &&, ||), conditionals (if, case), loops (for, while, until), functions, and redirections.
  • Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
  • Record source spans to surface meaningful evidence ("line 12: exec java -jar $APP_JAR").

4) Partial evaluation

  • Initialise symbol table from image environment plus caller-supplied args.
  • Treat $@, $*, $1..$9 as tainted; propagate taint through assignments.
  • Resolve ${VAR:-default} and ${VAR:+alt} when VAR known; otherwise branch.
  • Support set -- … (resets positional parameters) and shift.
  • source/. commands are parsed recursively when files are available; else fallback to low-confidence branch.

5) Exec sink detection

  • exec <cmd> dominates the remainder of the script.
  • Chains such as exec gosu "$USER" "$@" feed into wrapper collapse.
  • When no exec is present, pick the last reachable simple command in the main path.
  • Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked IsFallback.

6) Idiom library

Pattern Action
if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi Rewrite argv to prepend default command.
if [ "$1" = "bash" ]; then exec "$@"; fi Pass-through for manual shells.
exec "$@" + non-empty CMD Substitute CMD vector into plan.
exec java -jar "$APP_JAR" "$@" Resolve JAR via env or filesystem.
set -- gosu "$APP_USER" "$@" Collapse into wrapper plan.

Idioms are implemented as AST visitors; each adds evidence strings when triggered.

7) Confidence scoring

  • Base score from plan heuristics (HeuristicScore).
  • Penalties for unresolved taint ($@ unknown), missing files, nested subshells, or fallbacks.
  • Bonus when idioms match, artefacts exist, or env values resolve cleanly.
  • Final confidence is combined with the outer static scoring model.

8) Failure modes

  • Missing script (ENTRYPOINT points to deleted file): emit fallback plan with low confidence.
  • Self-modifying scripts or heavy dynamic features (eval, backticks): mark plan as low-confidence and surface warning evidence.
  • Commands that spawn supervisors without exec: return both the supervisor and inferred children (if configuration files are present).

ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.