Add Authority Advisory AI and API Lifecycle Configuration

- Introduced AuthorityAdvisoryAiOptions and related classes for managing advisory AI configurations, including remote inference options and tenant-specific settings. - Added AuthorityApiLifecycleOptions to control API lifecycle settings, including legacy OAuth endpoint configurations. - Implemented validation and normalization methods for both advisory AI and API lifecycle options to ensure proper configuration. - Created AuthorityNotificationsOptions and its related classes for managing notification settings, including ack tokens, webhooks, and escalation options. - Developed IssuerDirectoryClient and related models for interacting with the issuer directory service, including caching mechanisms and HTTP client configurations. - Added support for dependency injection through ServiceCollectionExtensions for the Issuer Directory Client. - Updated project file to include necessary package references for the new Issuer Directory Client library.
2025-11-02 13:40:38 +02:00
parent 66cb6c4b8a
commit f98cea3bcf
516 changed files with 68157 additions and 24754 deletions
--- a/docs/modules/scanner/operations/entrypoint-static-analysis.md
+++ b/docs/modules/scanner/operations/entrypoint-static-analysis.md
@@ -1,11 +1,89 @@
-# Entry-Point Static Analysis
-
-This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
-
-## 1) Loading OCI images
-
-### 1.1 Supported inputs
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
+# Entry-Point Static Analysis
+
+This guide captures the static half of Stella Ops’ entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
+
+## 0) Implementation snapshot — Sprint 130.A (2025-11-02)
+
+The `StellaOps.Scanner.EntryTrace` stack (analyzer + worker + surfaces) currently provides:
+
+- **OCI config + layered FS context**: `EntryTraceImageContextFactory` normalises environment (`PATH` fallback), user, and working directory while `LayeredRootFileSystem` handles whiteouts, symlinks, and bounded byte reads (`TryReadBytes`) so ELF/PE probing stays offline friendly.
+- **Wrapper-aware exec expansion**: the analyzer unwraps init/user-switch/environment/supervisor wrappers (`tini`, `dumb-init`, `gosu`, `su-exec`, `chpst`, `env`, `supervisord`, `s6-supervise`, `runsv*`) and records guard metadata plus environment/user deltas on nodes and edges.
+- **Script + interpreter resolution**: POSIX shell parsing (AST-driven) covers `source`, `run-parts`, `exec`, and supervisor service directories, with Windows `cmd /c` support. Python `-m`, Node script, and Java `-jar` lookups add evidence when targets are located.
+- **Terminal classification & scoring**: `ClassifyTerminal` fingerprints ELF (`PT_INTERP`, Go build ID, Rust notes), PE/CLR, and JAR manifests, pairs them with shebang/runtime heuristics (`python`, `node`, `java`, `.NET`, `php-fpm`, `nginx`, `ruby`), and emits `EntryTracePlan/EntryTraceTerminal` records capped at 95-point confidence.
+- **NDJSON + capability stream**: `EntryTraceNdjsonWriter` produces deterministic `entrytrace.entry/node/edge/target/warning/capability` lines consumed by AOC, CLI, and policy surfaces.
+- **Runtime reconciliation**: `ProcFileSystemSnapshot` + `ProcGraphBuilder` replay `/proc`, `EntryTraceRuntimeReconciler` merges runtime terminals with static predictions, and diagnostics note matches/mismatches.
+- **Surface integration**: Scanner Worker caches graphs (`SurfaceCache`), persists `EntryTraceResult` via the shared store, exposes NDJSON + graph through `ScanAnalysisKeys`, and the WebService/CLI (`scan entrytrace`) return the stored result.
+
+Open follow-ups tracked for this wave:
+
+- **SCANNER-ENTRYTRACE-18-507** – fallback candidate discovery (Docker history, `/etc/services/**`, `/usr/local/bin/*-entrypoint`) when ENTRYPOINT/CMD are empty.
+- **SCANNER-ENTRYTRACE-18-508** – broaden wrapper catalogue (package/tool runners such as `bundle exec`, `npm`, `yarn node`, `docker-php-entrypoint`, `pipenv`, `poetry run`).
+- **ENTRYTRACE-SURFACE-01** (DOING) / **ENTRYTRACE-SURFACE-02** (TODO) – finish wiring Surface.Validation/FS/Secrets to gate prerequisites and remove direct env/secret reads.
+
+_Sections §4–§7 below capture the long-term reduction design; features not yet implemented are explicitly noted in the task board._
+
+### Probing the analyzer today
+
+1. **Load the image config**  
+   ```csharp
+   using var stream = File.OpenRead("config.json");
+   var config = OciImageConfigLoader.Load(stream);
+   ```
+2. **Create a layered filesystem** from extracted layer directories or tar archives:  
+   ```csharp
+   var fs = LayeredRootFileSystem.FromArchives(layers);
+   ```
+3. **Build the image context** (normalises env, PATH, user, working dir):  
+   ```csharp
+   var imageCtx = EntryTraceImageContextFactory.Create(
+       config, fs, new EntryTraceAnalyzerOptions(), imageDigest, scanId);
+   ```
+4. **Resolve the entry trace**:  
+   ```csharp
+   var analyzer = serviceProvider.GetRequiredService<IEntryTraceAnalyzer>();
+   var graph = await analyzer.ResolveAsync(imageCtx.Entrypoint, imageCtx.Context, cancellationToken);
+   ```
+5. **Inspect results** – `graph.Terminals` lists classified candidates (path, runtime, confidence, evidence), `graph.Nodes/Edges` capture the explainable chain, and `graph.Diagnostics` highlight unresolved steps. Emit metrics/telemetry via `EntryTraceMetrics`.
+6. **Serialize if needed** – pass the graph through `EntryTraceNdjsonWriter.Serialize` to obtain deterministic NDJSON lines; the helper already computes capability summaries.
+
+For ad-hoc investigation, snapshotting `EntryTraceResult` keeps graph and NDJSON aligned. Avoid ad-hoc JSON writers to maintain ordering guarantees.
+
+#### Probing through Scanner.Worker
+
+EntryTrace runs automatically inside the worker when these metadata keys exist on the lease:
+
+| Key | Purpose |
+| --- | --- |
+| `ScanMetadataKeys.ImageConfigPath` (default `scanner.analyzers.entrytrace.configMetadataKey`) | Absolute path to the OCI `config.json`. |
+| `ScanMetadataKeys.LayerDirectories` or `ScanMetadataKeys.LayerArchives` | Semicolon-delimited list of extracted layer folders or tar archives. |
+| `ScanMetadataKeys.RuntimeProcRoot` *(optional)* | Path to a captured `/proc` tree for runtime reconciliation (air-gapped runs can mount a snapshot). |
+
+Worker output lands in `context.Analysis` (`EntryTraceGraph`, `EntryTraceNdjson`) and is persisted via `IEntryTraceResultStore`. Ensure Surface Validation prerequisites pass before dispatching the analyzer.
+
+#### Probing via WebService & CLI
+
+- **REST**: `GET /api/scans/{scanId}/entrytrace` returns `EntryTraceResponse` (`graph + ndjson + metadata`). Requires scan ownership/authz.
+- **CLI**: `stella scan entrytrace <scan-id> [--ndjson] [--verbose]` renders a confidence-sorted terminal table, diagnostics, and optionally the NDJSON payload.
+
+Both surfaces consume the persisted result; rerunning the worker updates the stored document atomically.
+
+### NDJSON reference
+
+`EntryTraceNdjsonWriter.Serialize` emits newline-delimited JSON in the following order so AOC consumers can stream without buffering:
+
+- `entrytrace.entry` — scan metadata (scan id, image digest, outcome, counts).
+- `entrytrace.node` — every node in the graph with arguments, interpreter, evidence, and metadata.
+- `entrytrace.edge` — directed relationships between nodes with optional wrapper metadata.
+- `entrytrace.target` — resolved terminal programmes (`EntryTracePlan`), including runtime, confidence, arguments, environment, and evidence.
+- `entrytrace.warning` — diagnostics (severity, reason, span, related path).
+- `entrytrace.capability` — aggregated wrapper capabilities discovered during traversal.
+
+Every line ends with a newline and is emitted in deterministic order (IDs ascending, keys lexicographically sorted) so downstream tooling can hash or diff outputs reproducibly.
+
+## 1) Loading OCI images
+
+### 1.1 Supported inputs
+- Registry references (`repo:tag@sha256:digest`) using the existing content store.
 - Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).

 ### 1.2 Normalised model
@@ -53,14 +131,18 @@ Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-fo
 - For non-ELF/PE files: read first line; interpret `#!interpreter args`.
 - Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.

-### 3.3 Binary probes
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
+### 3.3 Binary probes
+- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer, `.rustc` notes, and musl/glibc fingerprints.
+- Identify PE (Windows) and detect .NET single-file bundles via CLI header / metadata tables; capture ready-to-run vs IL-only markers.
+- Inspect archives (JAR/WAR/EAR) for `META-INF/MANIFEST.MF` `Main-Class`/`Main-Module` and signed entries.
+- Detect PHP-FPM / nginx launchers (`php-fpm`, `apache2-foreground`, `nginx -g 'daemon off;'`) via binary names + nearby config (php.ini, nginx.conf).
+- Record evidence tuples for runtime scoring (interpreter, build ID, runtime note) so downstream components can explain the classification.

-## 4) Wrapper catalogue
-
-Collapse known wrappers before analysing the target command:
+## 4) Wrapper catalogue
+
+> _Roadmap note_: extended package/tool runners land with **SCANNER-ENTRYTRACE-18-508**; today the catalogue covers init/user-switch/environment/supervisor wrappers listed above.
+
+Collapse known wrappers before analysing the target command:

 - Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
 - Privilege droppers: `gosu`, `su-exec`, `chpst`.
@@ -111,12 +193,31 @@ Use a simple logistic model with feature contributions captured for the evidence

 Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.

-## 8) Outputs
-
-Return a populated `EntryTraceResult`:
-
- `Terminals` contains the best candidate(s) and their runtime classification.
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
- `Chain` shows the explainable path from initial Docker argv to the final binary.
-
-Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
+## 8) Outputs
+
+Return a populated `EntryTraceResult`:
+
+- `Terminals` contains the best candidate(s) and their runtime classification.
+- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
+- `Chain` shows the explainable path from initial Docker argv to the final binary.
+
+Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.
+
+## 9) ProcGraph replay (runtime parity)
+
+Static resolution must be reconciled with live observations when a workload is running under the Stella Ops runtime agent:
+
+1. Read `/proc/1/{cmdline,exe}` and walk descendants via `/proc/*/stat` to construct the initial exec chain (ascending PID order).
+2. Collapse known wrappers (`tini`, `dumb-init`, `gosu`, `su-exec`, `s6-supervise`, `runsv`, `supervisord`) and privilege switches, mirroring the static wrapper catalogue.
+3. Materialise a `ProcGraph` object that records each transition and the resolved executable path (via `/proc/<pid>/exe` symlinks).
+4. Compare `ProcGraph.Terminal` with `EntryTraceResult.Terminals[0]`, emitting `confidence=high` when they match and downgrade when divergence occurs.
+5. Persist the merged view so the CLI/UI can highlight static vs runtime discrepancies and feed drift detection in Zastava.
+
+This replay is optional offline, but required when runtime evidence is available so policy decisions can lean on High-confidence matches.
+
+## 10) Service & CLI surfaces
+
+- **Scanner.WebService** must expose `/scans/{scanId}/entrytrace` returning chain, terminal classification, evidence, and runtime agreement markers.
+- **CLI** gains `stella scan entrypoint <scanId>` (and JSON streaming) for air-gapped review.
+- **Policy / Export** payloads include `entrytrace_terminal`, `entrytrace_confidence`, and evidence arrays so downstream consumers retain provenance.
+- All outputs reuse the same `EntryTraceResult` schema and NDJSON stream defined in §7, keeping the Offline Kit and DSSE attestations deterministic.