feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00
parent 3154c67978
commit 7b5bdcf4d3
503 changed files with 16136 additions and 54638 deletions
--- a/docs/modules/scanner/operations/entrypoint-shell-analysis.md
+++ b/docs/modules/scanner/operations/entrypoint-shell-analysis.md
@@ -0,0 +1,83 @@
+# ShellFlow — Script Reduction Playbook
+
+Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.
+
+## 1) Scope
+
+- POSIX `sh` subset with common Bash extensions (control flow, functions, parameter expansion).
+- Handles idioms from official Docker images (`if [ "$1" = "server" ]; then …`, `exec gosu "$USER" "$@"`, `set -- java -jar …`).
+- Tracks positional parameters (`$@`, `$1..$9`), environment variables, and `set --` mutations.
+- Produces one or more candidate commands with supporting evidence.
+
+## 2) Architecture
+
+```
+ShellFlow/
+  Parser/           // POSIX sh lexer + recursive descent parser
+  Ast/              // nodes for lists, pipelines, conditionals, functions
+  Evaluator/        // partial evaluation & taint tracking
+  Idioms/           // pattern library for common Docker entrypoints
+  Planner/          // emits CommandPlan[]
+```
+
+### 2.1 CommandPlan
+
+```csharp
+public sealed record CommandPlan(
+  string[] Argv,
+  double   HeuristicScore,
+  IReadOnlyList<string> Evidence,
+  IReadOnlyList<ReductionEdge> Chain,
+  bool     IsFallback = false
+);
+```
+
+Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.
+
+## 3) Parsing & AST
+
+- Tokenise words, assignments, pipelines (`|`), lists (`;`, `&&`, `||`), conditionals (`if`, `case`), loops (`for`, `while`, `until`), functions, and redirections.
+- Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
+- Record source spans to surface meaningful evidence (`"line 12: exec java -jar $APP_JAR"`).
+
+## 4) Partial evaluation
+
+- Initialise symbol table from image environment plus caller-supplied args.
+- Treat `$@`, `$*`, `$1..$9` as tainted; propagate taint through assignments.
+- Resolve `${VAR:-default}` and `${VAR:+alt}` when `VAR` known; otherwise branch.
+- Support `set -- …` (resets positional parameters) and `shift`.
+- `source`/`.` commands are parsed recursively when files are available; else fallback to low-confidence branch.
+
+## 5) Exec sink detection
+
+- `exec <cmd>` dominates the remainder of the script.
+- Chains such as `exec gosu "$USER" "$@"` feed into wrapper collapse.
+- When no `exec` is present, pick the last reachable simple command in the main path.
+- Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked `IsFallback`.
+
+## 6) Idiom library
+
+| Pattern | Action |
+| --- | --- |
+| `if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi` | Rewrite argv to prepend default command. |
+| `if [ "$1" = "bash" ]; then exec "$@"; fi` | Pass-through for manual shells. |
+| `exec "$@"` + non-empty CMD | Substitute CMD vector into plan. |
+| `exec java -jar "$APP_JAR" "$@"` | Resolve JAR via env or filesystem. |
+| `set -- gosu "$APP_USER" "$@"` | Collapse into wrapper plan. |
+
+Idioms are implemented as AST visitors; each adds evidence strings when triggered.
+
+## 7) Confidence scoring
+
+- Base score from plan heuristics (`HeuristicScore`).
+- Penalties for unresolved taint (`$@` unknown), missing files, nested subshells, or fallbacks.
+- Bonus when idioms match, artefacts exist, or env values resolve cleanly.
+- Final confidence is combined with the outer static scoring model.
+
+## 8) Failure modes
+
+- Missing script (`ENTRYPOINT` points to deleted file): emit fallback plan with low confidence.
+- Self-modifying scripts or heavy dynamic features (`eval`, backticks): mark plan as low-confidence and surface warning evidence.
+- Commands that spawn supervisors without `exec`: return both the supervisor and inferred children (if configuration files are present).
+
+ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.