feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
master
2025-10-30 00:09:39 +02:00
parent 86f606a115
commit e8537460a3
503 changed files with 16136 additions and 54638 deletions

View File

@@ -0,0 +1,22 @@
# Scanner agent guide
## Mission
Scanner analyses container images layer-by-layer, producing deterministic SBOM fragments, diffs, and signed reports.
## Key docs
- [Module README](./README.md)
- [Architecture](./architecture.md)
- [Implementation plan](./implementation_plan.md)
- [Task board](./TASKS.md)
## How to get started
1. Open ../../implplan/SPRINTS.md and locate the stories referencing this module.
2. Review ./TASKS.md for local follow-ups and confirm status transitions (TODO → DOING → DONE/BLOCKED).
3. Read the architecture and README for domain context before editing code or docs.
4. Coordinate cross-module changes in the main /AGENTS.md description and through the sprint plan.
## Guardrails
- Honour the Aggregation-Only Contract where applicable (see ../../ingestion/aggregation-only-contract.md).
- Preserve determinism: sort outputs, normalise timestamps (UTC ISO-8601), and avoid machine-specific artefacts.
- Keep Offline Kit parity in mind—document air-gapped workflows for any new feature.
- Update runbooks/observability assets when operational characteristics change.

View File

@@ -0,0 +1,38 @@
# StellaOps Scanner
Scanner analyses container images layer-by-layer, producing deterministic SBOM fragments, diffs, and signed reports.
## Responsibilities
- Expose APIs (WebService) for scan orchestration, diffing, and artifact retrieval.
- Run Worker analyzers for OS, language, and native ecosystems with restart-only plug-ins.
- Store SBOM fragments and artifacts in RustFS/object storage.
- Publish DSSE-ready metadata for Signer/Attestor and downstream policy evaluation.
## Key components
- `StellaOps.Scanner.WebService` minimal API host.
- `StellaOps.Scanner.Worker` analyzer executor.
- Analyzer libraries under `StellaOps.Scanner.Analyzers.*`.
## Integrations & dependencies
- Scheduler for job intake and retries.
- Policy Engine for evidence handoff.
- Export Center / Offline Kit for artifact packaging.
## Operational notes
- CAS caches, bounded retries, DSSE integration.
- Monitoring dashboards (see ./operations/analyzers-grafana-dashboard.json).
- RustFS migration playbook.
## Related resources
- ./operations/analyzers.md
- ./operations/analyzers-grafana-dashboard.json
- ./operations/rustfs-migration.md
- ./operations/entrypoint.md
## Backlog references
- DOCS-SCANNER updates tracked in ../../TASKS.md.
- Analyzer parity work in src/Scanner/**/TASKS.md.
## Epic alignment
- **Epic 6 Vulnerability Explorer:** provide policy-aware scan outputs, explain traces, and findings ledger hooks for triage workflows.
- **Epic 10 Export Center:** generate export-ready artefacts, manifests, and DSSE metadata for bundles.

View File

@@ -0,0 +1,9 @@
# Task board — Scanner
> Local tasks should link back to ./AGENTS.md and mirror status updates into ../../TASKS.md when applicable.
| ID | Status | Owner(s) | Description | Notes |
|----|--------|----------|-------------|-------|
| SCANNER-DOCS-0001 | DOING (2025-10-29) | Docs Guild | Validate that ./README.md aligns with the latest release notes. | See ./AGENTS.md |
| SCANNER-OPS-0001 | TODO | Ops Guild | Review runbooks/observability assets after next sprint demo. | Sync outcomes back to ../../TASKS.md |
| SCANNER-ENG-0001 | TODO | Module Team | Cross-check implementation plan milestones against ../../implplan/SPRINTS.md. | Update status via ./AGENTS.md workflow |

View File

@@ -0,0 +1,489 @@
# component_architecture_scanner.md — **StellaOps Scanner** (2025Q4)
> Aligned with Epic6 Vulnerability Explorer and Epic10 Export Center.
> **Scope.** Implementationready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (RustFS default + Mongo, S3-compatible fallback), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
---
## 0) Mission & boundaries
**Mission.** Produce **deterministic**, **explainable** SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: **Inventory** (everything present) and **Usage** (entrypoint closure + actually linked libs). Attach attestations through **Signer→Attestor→Rekor v2**.
**Boundaries.**
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
* Scanner **does not** keep thirdparty SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugins (e.g., patchpresence) run under explicit flags and never contaminate the core SBOM.
---
## 1) Solution & project layout
```
src/
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
├─ StellaOps.Scanner.Storage/ # Mongo repositories; RustFS object client (default) + S3 fallback; ILM/GC
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Python|Go|DotNet|Rust]/
├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/ # PE/Mach-O planned (M2)
├─ StellaOps.Scanner.Emit.CDX/ # CycloneDX (JSON + Protobuf)
├─ StellaOps.Scanner.Emit.SPDX/ # SPDX 3.0.1 JSON
├─ StellaOps.Scanner.Diff/ # image→layer→component threeway diff
├─ StellaOps.Scanner.Index/ # BOMIndex sidecar (purls + roaring bitmaps)
├─ StellaOps.Scanner.Tests.* # unit/integration/e2e fixtures
└─ Tools/
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLIdriven scanner container
```
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
### 1.1 Queue backbone (Redis / NATS)
`StellaOps.Scanner.Queue` exposes a transport-agnostic contract (`IScanQueue`/`IScanQueueLease`) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:
- **Redis Streams** (default). Uses consumer groups, deterministic idempotency keys (`scanner:jobs:idemp:*`), and supports lease claim (`XCLAIM`), renewal, exponential-backoff retries, and a `scanner:jobs:dead` stream for exhausted attempts.
- **NATS JetStream**. Provisions the `SCANNER_JOBS` work-queue stream + durable consumer `scanner-workers`, publishes with `MsgId` for dedupe, applies backoff via `NAK` delays, and routes dead-lettered jobs to `SCANNER_JOBS_DEAD`.
Metrics are emitted via `Meter` counters (`scanner_queue_enqueued_total`, `scanner_queue_retry_total`, `scanner_queue_deadletter_total`), and `ScannerQueueHealthCheck` pings the active backend (Redis `PING`, NATS `PING`). Configuration is bound from `scanner.queue`:
```yaml
scanner:
queue:
kind: redis # or nats
redis:
connectionString: "redis://queue:6379/0"
streamName: "scanner:jobs"
nats:
url: "nats://queue:4222"
stream: "SCANNER_JOBS"
subject: "scanner.jobs"
durableConsumer: "scanner-workers"
deadLetterSubject: "scanner.jobs.dead"
maxDeliveryAttempts: 5
retryInitialBackoff: 00:00:05
retryMaxBackoff: 00:02:00
```
The DI extension (`AddScannerQueue`) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
**Runtime formfactor:** two deployables
* **Scanner.WebService** (stateless REST)
* **Scanner.Worker** (N replicas; queuedriven)
---
## 2) External dependencies
* **OCI registry** with **Referrers API** (discover attached SBOMs/signatures).
* **RustFS** (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; **Object Lock** semantics emulated via retention headers; **ILM** for TTL.
* **MongoDB** for catalog, job state, diffs, ILM rules.
* **Queue** (Redis Streams/NATS/RabbitMQ).
* **Authority** (onprem OIDC) for **OpToks** (DPoP/mTLS).
* **Signer** + **Attestor** (+ **Fulcio/KMS** + **Rekor v2**) for DSSE + transparency.
---
## 3) Contracts & data model
### 3.1 Evidencefirst component model
**Nodes**
* `Image`, `Layer`, `File`
* `Component` (`purl?`, `name`, `version?`, `type`, `id` — may be `bin:{sha256}`)
* `Executable` (ELF/PE/MachO), `Library` (native or managed), `EntryScript` (shell/launcher)
**Edges** (all carry **Evidence**)
* `contains(Image|Layer → File)`
* `installs(PackageDB → Component)` (OS database row)
* `declares(InstalledMetadata → Component)` (distinfo, pom.properties, deps.json…)
* `links_to(Executable → Library)` (ELF `DT_NEEDED`, PE imports)
* `calls(EntryScript → Program)` (file:line from shell AST)
* `attests(Rekor → Component|Image)` (SBOM/predicate binding)
* `bound_from_attestation(Component_attested → Component_observed)` (hash equality proof)
**Evidence**
```
{ source: enum, locator: (path|offset|line), sha256?, method: enum, timestamp }
```
No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.
### 3.2 Catalog schema (Mongo)
* `artifacts`
```
{ _id, type: layer-bom|image-bom|diff|index,
format: cdx-json|cdx-pb|spdx-json,
bytesSha256, size, rekor: { uuid,index,url }?,
ttlClass, immutable, refCount, createdAt }
```
* `images { imageDigest, repo, tag?, arch, createdAt, lastSeen }`
* `layers { layerDigest, mediaType, size, createdAt, lastSeen }`
* `links { fromType, fromDigest, artifactId }` // image/layer -> artifact
* `jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error }`
* `lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }`
### 3.3 Object store layout (RustFS)
```
layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb # CycloneDX Protobuf
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin # purls + roaring bitmaps
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json # DSSE bundle (cert chain + Rekor proof)
```
RustFS exposes a deterministic HTTP API (`PUT|GET|DELETE /api/v1/buckets/{bucket}/objects/{key}`).
Scanner clients tag immutable uploads with `X-RustFS-Immutable: true` and, when retention applies,
`X-RustFS-Retain-Seconds: <ttlSeconds>`. Additional headers can be injected via
`scanner.artifactStore.headers` to support custom auth or proxy requirements. Legacy MinIO/S3
deployments remain supported by setting `scanner.artifactStore.driver = "s3"` during phased
migrations.
---
## 4) REST API (Scanner.WebService)
All under `/api/v1/scanner`. Auth: **OpTok** (DPoP/mTLS); RBAC scopes.
```
POST /scans { imageRef|digest, force?:bool } → { scanId }
GET /scans/{id} → { status, imageDigest, artifacts[], rekor? }
GET /sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
GET /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
POST /exports { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
POST /reports { imageDigest, policyRevision? } → { reportId, rekor? } # delegates to backend policy+vex
GET /catalog/artifacts/{id} → { meta }
GET /healthz | /readyz | /metrics
```
### Report events
When `scanner.events.enabled = true`, the WebService serialises the signed report (canonical JSON + DSSE envelope) with `NotifyCanonicalJsonSerializer` and publishes two Redis Stream entries (`scanner.report.ready`, `scanner.scan.completed`) to the configured stream (default `stella.events`). The stream fields carry the whole envelope plus lightweight headers (`kind`, `tenant`, `ts`) so Notify and UI timelines can consume the event bus without recomputing signatures. Publish timeouts and bounded stream length are controlled via `scanner:events:publishTimeoutSeconds` and `scanner:events:maxStreamLength`. If the queue driver is already Redis and no explicit events DSN is provided, the host reuses the queue connection and auto-enables event emission so deployments get live envelopes without extra wiring. Compose/Helm bundles expose the same knobs via the `SCANNER__EVENTS__*` environment variables for quick tuning.
---
## 5) Execution flow (Worker)
### 5.1 Acquire & verify
1. **Resolve image** (prefer `repo@sha256:`).
2. **(Optional) verify image signature** per policy (cosign).
3. **Pull blobs**, compute layer digests; record metadata.
### 5.2 Layer union FS
* Apply whiteouts; materialize final filesystem; map **file → first introducing layer**.
* Windows layers (MSI/SxS/GAC) planned in **M2**.
### 5.3 Evidence harvest (parallel analyzers; deterministic only)
**A) OS packages**
* **apk**: `/lib/apk/db/installed`
* **dpkg**: `/var/lib/dpkg/status`, `/var/lib/dpkg/info/*.list`
* **rpm**: `/var/lib/rpm/Packages` (via librpm or parser)
* Record `name`, `version` (epoch/revision), `arch`, source package where present, and **declared file lists**.
> **Data flow note:** Each OS analyzer now writes its canonical output into the shared `ScanAnalysisStore` under
> `analysis.os.packages` (raw results), `analysis.os.fragments` (per-analyzer layer fragments), and contributes to
> `analysis.layers.fragments` (the aggregated view consumed by emit/diff pipelines). Helpers in
> `ScanAnalysisCompositionBuilder` convert these fragments into SBOM composition requests and component graphs so the
> diff/emit stages no longer reach back into individual analyzer implementations.
**B) Language ecosystems (installed state only)**
* **Java**: `META-INF/maven/*/pom.properties`, MANIFEST → `pkg:maven/...`
* **Node**: `node_modules/**/package.json` → `pkg:npm/...`
* **Python**: `*.dist-info/{METADATA,RECORD}` → `pkg:pypi/...`
* **Go**: Go **buildinfo** in binaries → `pkg:golang/...`
* **.NET**: `*.deps.json` + assembly metadata → `pkg:nuget/...`
* **Rust**: crates only when **explicitly present** (embedded metadata or cargo/registry traces); otherwise binaries reported as `bin:{sha256}`.
> **Rule:** We only report components proven **on disk** with authoritative metadata. Lockfiles are evidence only.
**C) Native link graph**
* **ELF**: parse `PT_INTERP`, `DT_NEEDED`, RPATH/RUNPATH, **GNU symbol versions**; map **SONAMEs** to file paths; link executables → libs.
* **PE/MachO** (planned M2): import table, delayimports; version resources; code signatures.
* Map libs back to **OS packages** if possible (via file lists); else emit `bin:{sha256}` components.
* The exported metadata (`stellaops.os.*` properties, license list, source package) feeds policy scoring and export pipelines
directly Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into
downstream JSON/Trivy payloads.
**D) EntryTrace (ENTRYPOINT/CMD → terminal program)**
* Read image config; parse shell (POSIX/Bash subset) with AST: `source`/`.` includes; `case/if`; `exec`/`command`; `runparts`.
* Resolve commands via **PATH** within the **built rootfs**; follow language launchers (Java/Node/Python) to identify the terminal program (ELF/JAR/venv script).
* Record **file:line** and choices for each hop; output chain graph.
* Unresolvable dynamic constructs are recorded as **unknown** edges with reasons (e.g., `$FOO` unresolved).
**E) Attestation & SBOM bind (optional)**
* For each **file hash** or **binary hash**, query local cache of **Rekor v2** indices; if an SBOM attestation is found for **exact hash**, bind it to the component (origin=`attested`).
* For the **image** digest, likewise bind SBOM attestations (buildtime referrers).
### 5.4 Component normalization (exact only)
* Create `Component` nodes only with deterministic identities: purl, or **`bin:{sha256}`** for unlabeled binaries.
* Record **origin** (OS DB, installed metadata, linker, attestation).
### 5.5 SBOM assembly & emit
* **Per-layer SBOM fragments**: components introduced by the layer (+ relationships).
* **Image SBOMs**: merge fragments; refer back to them via **CycloneDX BOMLink** (or SPDX ExternalRef).
* Emit both **Inventory** & **Usage** views.
* When the native analyzer reports an ELF `buildId`, attach it to component metadata and surface it as `stellaops:buildId` in CycloneDX properties (and diff metadata). This keeps SBOM/diff output in lockstep with runtime events and the debug-store manifest.
* Serialize **CycloneDX JSON** and **CycloneDX Protobuf**; optionally **SPDX 3.0.1 JSON**.
* Build **BOMIndex** sidecar: purl table + roaring bitmap; flag `usedByEntrypoint` components for fast backend joins.
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
### 5.6 DSSE attestation (via Signer/Attestor)
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.
* **Attestor** logs to **Rekor v2**; returns `{uuid,index,proof}` → stored in `artifacts.rekor`.
---
## 6) Threeway diff (image → layer → component)
### 6.1 Keys & classification
* Component key: **purl** when present; else `bin:{sha256}`.
* Diff classes: `added`, `removed`, `version_changed` (`upgraded|downgraded`), `metadata_changed` (e.g., origin from attestation vs observed).
* Layer attribution: for each change, resolve the **introducing/removing layer**.
### 6.2 Algorithm (outline)
```
A = components(imageOld, key)
B = components(imageNew, key)
added = B \ A
removed = A \ B
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }
for each item in added/removed/changed:
layer = attribute_to_layer(item, imageOld|imageNew)
usageFlag = usedByEntrypoint(item, imageNew)
emit diff.json (grouped by layer with badges)
```
Diffs are stored as artifacts and feed **UI** and **CLI**.
---
## 7) Buildtime SBOMs (fast CI path)
**Scanner.Sbomer.BuildXPlugin** can act as a BuildKit **generator**:
* During `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, run analyzers on the build context/output; attach SBOMs as OCI **referrers** to the built image.
* Optionally request **Signer/Attestor** to produce **StellaOpsverified** attestation immediately; else, Scanner.WebService can verify and reattest postpush.
* Scanner.WebService trusts buildtime SBOMs per policy, enabling **norescan** for unchanged bases.
---
## 8) Configuration (YAML)
```yaml
scanner:
queue:
kind: redis
url: "redis://queue:6379/0"
mongo:
uri: "mongodb://mongo/scanner"
s3:
endpoint: "http://minio:9000"
bucket: "stellaops"
objectLock: "governance" # or 'compliance'
analyzers:
os: { apk: true, dpkg: true, rpm: true }
lang: { java: true, node: true, python: true, go: true, dotnet: true, rust: true }
native: { elf: true, pe: false, macho: false } # PE/Mach-O in M2
entryTrace: { enabled: true, shellMaxDepth: 64, followRunParts: true }
emit:
cdx: { json: true, protobuf: true }
spdx: { json: true }
compress: "zstd"
rekor:
url: "https://rekor-v2.internal"
signer:
url: "https://signer.internal"
limits:
maxParallel: 8
perRegistryConcurrency: 2
policyHints:
verifyImageSignature: false
trustBuildTimeSboms: true
```
---
## 9) Scale & performance
* **Parallelism**: peranalyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.
* **Distributed locks** per **layer digest** to prevent duplicate work across Workers.
* **Registry throttles**: perhost concurrency budgets; exponential backoff on 429/5xx.
* **Targets**:
* **Buildtime**: P95 ≤35s on warmed bases (CI generator).
* **Postbuild delta**: P95 ≤10s for 200MB images with cache hit.
* **Emit**: CycloneDX Protobuf ≤150ms for 5k components; JSON ≤500ms.
* **Diff**: ≤200ms for 5k vs 5k components.
---
## 10) Security posture
* **AuthN**: Authorityissued short OpToks (DPoP/mTLS).
* **AuthZ**: scopes (`scanner.scan`, `scanner.export`, `scanner.catalog.read`).
* **mTLS** to **Signer**/**Attestor**; only **Signer** can sign.
* **No network fetches** during analysis (except registry pulls and optional Rekor index reads).
* **Sandboxing**: nonroot containers; readonly FS; seccomp profiles; disable execution of scanned content.
* **Release integrity**: all firstparty images are **cosignsigned**; Workers/WebService selfverify at startup.
---
## 11) Observability & audit
* **Metrics**:
* `scanner.jobs_inflight`, `scanner.scan_latency_seconds`
* `scanner.layer_cache_hits_total`, `scanner.file_cas_hits_total`
* `scanner.artifact_bytes_total{format}`
* `scanner.attestation_latency_seconds`, `scanner.rekor_failures_total`
* `scanner_analyzer_golang_heuristic_total{indicator,version_hint}` — increments whenever the Go analyzer falls back to heuristics (build-id or runtime markers). Grafana panel: `sum by (indicator) (rate(scanner_analyzer_golang_heuristic_total[5m]))`; alert when the rate is ≥1 for 15minutes to highlight unexpected stripped binaries.
* **Tracing**: spans for acquire→union→analyzers→compose→emit→sign→log.
* **Audit logs**: DSSE requests log `license_id`, `image_digest`, `artifactSha256`, `policy_digest?`, Rekor UUID on success.
---
## 12) Testing matrix
* **Determinism:** given same image + analyzers → byteidentical **CDX Protobuf**; JSON normalized.
* **OS packages:** groundtruth images per distro; compare to package DB.
* **Lang ecosystems:** sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfileonly.
* **Native & EntryTrace:** ELF graph correctness; shell AST cases (includes, runparts, exec, case/if).
* **Diff:** layer attribution against synthetic twoimage sequences.
* **Performance:** cold vs warm cache; large `node_modules` and `sitepackages`.
* **Security:** ensure no code execution from image; fuzz parser inputs; path traversal resistance on layer extract.
---
## 13) Failure modes & degradations
* **Missing OS DB** (files exist, DB removed): record **files**; do **not** fabricate package components; emit `bin:{sha256}` where unavoidable; flag in evidence.
* **Unreadable metadata** (corrupt distinfo): record file evidence; skip component creation; annotate.
* **Dynamic shell constructs**: mark unresolved edges with reasons (env var unknown) and continue; **Usage** view may be partial.
* **Registry rate limits**: honor backoff; queue job retries with jitter.
* **Signer refusal** (license/plan/version): scan completes; artifact produced; **no attestation**; WebService marks result as **unverified**.
---
## 14) Optional plugins (off by default)
* **Patchpresence detector** (signaturebased backport checks). Reads curated functionlevel signatures from advisories; inspects binaries for patched code snippets to lower falsepositives for backported fixes. Runs as a sidecar analyzer that **annotates** components; never overrides core identities.
* **Runtime probes** (with Zastava): when allowed, compare **/proc/<pid>/maps** (DSOs actually loaded) with static **Usage** view for precision.
---
## 15) DevOps & operations
* **HA**: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
* **Retention**: ILM rules per artifact class (`short`, `default`, `compliance`); **Object Lock** for compliance artifacts (reports, signed SBOMs).
* **Upgrades**: bump **cache schema** when analyzer outputs change; WebService triggers refresh of dependent artifacts.
* **Backups**: Mongo (daily dumps); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.
---
## 16) CLI & UI touch points
* **CLI**: `stellaops scan <ref>`, `stellaops diff --old --new`, `stellaops export`, `stellaops verify attestation <bundle|url>`.
* **UI**: Scan detail shows **Inventory/Usage** toggles, **Diff by Layer**, **Attestation badge** (verified/unverified), Rekor link, and **EntryTrace** chain with file:line breadcrumbs.
---
## 17) Roadmap (Scanner)
* **M2**: Windows containers (MSI/SxS/GAC analyzers), PE/MachO native analyzer, deeper Rust metadata.
* **M2**: Buildx generator GA (certified external registries), crossregistry trust policies.
* **M3**: Patchpresence plugin GA (optin), crossimage corpus clustering (evidenceonly; not identity).
* **M3**: Advanced EntryTrace (POSIX shell features breadth, busybox detection).
---
### Appendix A — EntryTrace resolution (pseudo)
```csharp
ResolveEntrypoint(ImageConfig cfg, RootFs fs):
cmd = Normalize(cfg.ENTRYPOINT, cfg.CMD)
stack = [ Script(cmd, path=FindOnPath(cmd[0], fs)) ]
visited = set()
while stack not empty and depth < MAX:
cur = stack.pop()
if cur in visited: continue
visited.add(cur)
if IsShellScript(cur.path):
ast = ParseShell(cur.path)
foreach directive in ast:
if directive is Source include:
p = ResolveInclude(include.path, cur.env, fs)
stack.push(Script(p))
if directive is Exec call:
p = ResolveExec(call.argv[0], cur.env, fs)
stack.push(Program(p, argv=call.argv))
if directive is Interpreter (python -m / node / java -jar):
term = ResolveInterpreterTarget(call, fs)
stack.push(Program(term))
else:
return Terminal(cur.path)
return Unknown(reason)
```
### Appendix A.1 — EntryTrace Explainability
EntryTrace emits structured diagnostics and metrics so operators can quickly understand why resolution succeeded or degraded:
| Reason | Description | Typical Mitigation |
|--------|-------------|--------------------|
| `CommandNotFound` | A command referenced in the script cannot be located in the layered root filesystem or `PATH`. | Ensure binaries exist in the image or extend `PATH` hints. |
| `MissingFile` | `source`/`.`/`run-parts` targets are missing. | Bundle the script or guard the include. |
| `DynamicEnvironmentReference` | Path depends on `$VARS` that are unknown at scan time. | Provide defaults via scan metadata or accept partial usage. |
| `RecursionLimitReached` | Nested includes exceeded the analyzer depth limit (default 64). | Flatten indirection or increase the limit in options. |
| `RunPartsEmpty` | `run-parts` directory contained no executable entries. | Remove empty directories or ignore if intentional. |
| `JarNotFound` / `ModuleNotFound` | Java/Python targets missing, preventing interpreter tracing. | Ship the jar/module with the image or adjust the launcher. |
Diagnostics drive two metrics published by `EntryTraceMetrics`:
- `entrytrace_resolutions_total{outcome}` — resolution attempts segmented by outcome (`resolved`, `partiallyresolved`, `unresolved`).
- `entrytrace_unresolved_total{reason}` — diagnostic counts keyed by reason.
Structured logs include `entrytrace.path`, `entrytrace.command`, `entrytrace.reason`, and `entrytrace.depth`, all correlated with scan/job IDs. Timestamps are normalized to UTC (microsecond precision) to keep DSSE attestations and UI traces explainable.
### Appendix B — BOMIndex sidecar
```
struct Header { magic, version, imageDigest, createdAt }
vector<string> purls
map<purlIndex, roaring_bitmap> components
optional map<purlIndex, roaring_bitmap> usedByEntrypoint
```

View File

@@ -0,0 +1,64 @@
# Implementation plan — Scanner
## Delivery phases
- **Phase 1 Control plane & job queue**
Finalise Scanner WebService, queue abstraction (Redis/NATS), job leasing, CAS layer cache, artifact catalog, and API endpoints.
- **Phase 2 Analyzer parity & SBOM assembly**
Implement OS/Lang/Native analyzers, inventory/usage SBOM views, entry trace resolution, deterministic component identity.
- **Phase 3 Diff & attestations**
Deliver three-way diff engine, DSSE SBOM/report signing pipeline, attestation hand-off (Signer→Attestor), metadata for Export Center.
- **Phase 4 Integrations & exports**
Integrate with Policy Engine, Vuln Explorer, Export Center, CLI/Console; provide buildx plugin, CLI commands, and offline scanning support.
- **Phase 5 Observability & resilience**
Metrics/logs/traces, queue backpressure handling, cache eviction, runbooks, smoke tests, SLO dashboards.
## Work breakdown
- **Control plane**
- REST API for scan requests, diff, catalog listing, artifact retrieval.
- Queue service with idempotency, retries, dead-letter handling; worker scaling.
- CAS storage (RustFS + S3 fallback), GC, ILM policies, offline mode.
- **Analyzers**
- OS (apk/dpkg/rpm), language (Java/Node/Python/Go/DotNet/Rust), native (ELF/PE/MachO).
- Deterministic metadata (purl, version, source location), heuristics optional under flags.
- Entry trace/usage analysis, dependency resolution, license detection.
- **SBOM & diff**
- Inventory/usage SBOM assembly, CycloneDX/SPDX emitters, schema validation.
- Three-way diff (base, target, runtime), evidence linking, JSON export.
- **Attestation & export**
- DSSE bundle signing, attestation metadata for Signer/Attestor, provenance summary.
- Export Center integration (SBOM/diff artifacts, manifests), CLI builder plugin (buildx).
- **CLI/Console**
- CLI commands `stella scan`, `stella sbom diff`, `stella sbom export`, offline caching.
- Console flows for scan requests, diff viewer, SBOM downloads, attestation status.
- **Observability & ops**
- Metrics (queue depth, scan latency, cache hit/miss, analyzer timing), logs/traces with job IDs.
- Alerts for backlog, failed scans, attestation issues, storage pressure.
- Runbooks for stuck jobs, cache corruption, analyzer regressions, offline mode.
## Acceptance criteria
- Scans produce deterministic SBOM inventory/usage views with component identity stability and reproducible diffs.
- Queue/worker pipeline handles retries, backpressure, offline kits, and exports DSSE attestations for Signer/Attestor.
- Export Center consumes SBOM/diff artifacts; Vuln Explorer receives metadata and explain traces.
- CLI/Console parity for scan submission, diffing, exports, attestation verification.
- Observability dashboards cover queue health, analyzer success rates, performance; alerts fire on SLO breaches.
- Offline scanning (air-gapped) supported with local caches and manifest verification.
## Risks & mitigations
- **Analyzer drift/determinism:** golden fixtures, hash-based regression tests, deterministic sorting, strict identity rules.
- **Queue overload:** adaptive backpressure, scaling workers, dead-letter review, priority lanes.
- **Storage growth:** CAS dedupe, ILM policies, offline bundle pruning.
- **Attestation failures:** retry with backoff, attestation health checks, Notify integration.
- **Offline divergence:** packaging of analyzers/configs, manifest signatures, parity tests.
## Test strategy
- **Unit:** analyzer parsers, component identity, diff calculations, API validation.
- **Integration:** end-to-end scan/diff/attestation flows, Export Center integration, CLI automation.
- **Performance:** large images, concurrent scans, cache stress, queue throughput.
- **Determinism:** repeated scans/diffs across systems, hash comparisons, property tests.
- **Security:** RBAC, tenant isolation, attestation key handling, path sanitisation.
- **Offline:** air-gap scanning, manifest verification, CLI offline mode.
## Definition of done
- Scanner services, analyzers, diffing, attestation pipeline, exports, and observability delivered with runbooks and Offline Kit parity.
- Documentation (architecture, analyzer guides, CLI, offline mode, operations) updated with imposed rule statements.
- ./TASKS.md and ../../TASKS.md updated with progress; regression fixtures maintained in repo.

View File

@@ -0,0 +1,155 @@
{
"title": "StellaOps Scanner Analyzer Benchmarks",
"uid": "scanner-analyzer-bench",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"graphTooltip": 0,
"time": {
"from": "now-24h",
"to": "now"
},
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0,
"current": {}
}
]
},
"annotations": {
"list": []
},
"panels": [
{
"id": 1,
"title": "Max Duration (ms)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"unit": "ms",
"displayName": "{{scenario}}"
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "scanner_analyzer_bench_max_ms",
"legendFormat": "{{scenario}}",
"refId": "A"
},
{
"expr": "scanner_analyzer_bench_baseline_max_ms",
"legendFormat": "{{scenario}} baseline",
"refId": "B"
}
]
},
{
"id": 2,
"title": "Regression Ratio vs Limit",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"displayName": "{{scenario}}",
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 20
}
]
}
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"expr": "(scanner_analyzer_bench_regression_ratio - 1) * 100",
"legendFormat": "{{scenario}} regression %",
"refId": "A"
},
{
"expr": "(scanner_analyzer_bench_regression_limit - 1) * 100",
"legendFormat": "{{scenario}} limit %",
"refId": "B"
}
]
},
{
"id": 3,
"title": "Breached Scenarios",
"type": "stat",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"displayName": "{{scenario}}",
"unit": "short"
},
"overrides": []
},
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
}
},
"targets": [
{
"expr": "scanner_analyzer_bench_regression_breached",
"legendFormat": "{{scenario}}",
"refId": "A"
}
]
}
]
}

View File

@@ -0,0 +1,48 @@
# Scanner Analyzer Benchmarks Operations Guide
## Purpose
Keep the language analyzer microbench under the <5s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.
> **Grafana note:** Import `docs/modules/scanner/operations/analyzers-grafana-dashboard.json` into your Prometheus-backed Grafana stack to monitor `scanner_analyzer_bench_*` metrics and alert on regressions.
## Publishing workflow
1. CI (or engineers running locally) execute:
```bash
dotnet run \
--project src/Bench/StellaOps.Bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
-- \
--repo-root . \
--out src/Bench/StellaOps.Bench/Scanner.Analyzers/baseline.csv \
--json out/bench/scanner-analyzers/latest.json \
--prom out/bench/scanner-analyzers/latest.prom \
--commit "$(git rev-parse HEAD)" \
--environment "${CI_ENVIRONMENT_NAME:-local}"
```
2. Publish the artefacts (`baseline.csv`, `latest.json`, `latest.prom`) to `bench-artifacts/<date>/`.
3. Promtail (or the CI job) pushes `latest.prom` into Prometheus; JSON lands in long-term storage for workbook snapshots.
4. The harness exits non-zero if:
- `max_ms` for any scenario breaches its configured threshold; or
- `max_ms` regresses ≥20% versus `baseline.csv`.
## Grafana dashboard
- Import `docs/modules/scanner/operations/analyzers-grafana-dashboard.json`.
- Point the template variable `datasource` to the Prometheus instance ingesting `scanner_analyzer_bench_*` metrics.
- Panels:
- **Max Duration (ms)** compares live runs vs baseline.
- **Regression Ratio vs Limit** plots `(max / baseline_max - 1) * 100`.
- **Breached Scenarios** stat panel sourced from `scanner_analyzer_bench_regression_breached`.
## Alerting & on-call response
- **Primary alert**: fire when `scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20` for 2 consecutive samples (10min default). Suggested PromQL:
```
max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
```
- Suppress duplicates using the `scenario` label.
- Pager payload should include `scenario`, `max_ms`, `baseline_max_ms`, and `commit`.
- Immediate triage steps:
1. Check `latest.json` artefact for the failing scenario confirm commit and environment.
2. Re-run the harness with `--captured-at` and `--baseline` pointing at the last known good CSV to verify determinism.
3. If regression persists, open an incident ticket tagged `scanner-analyzer-perf` and page the owning language guild.
4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.
Document the outcome in `docs/12_PERFORMANCE_WORKBOOK.md` (section 8) so trendlines reflect any accepted regressions.

View File

@@ -0,0 +1,72 @@
# Entry-Point Dynamic Analysis
When we have access to a running container (e.g., during runtime posture checks), StellaOps augments the static inference with live signals. This document describes the Observational Exec Graph (OEG) that powers the dynamic mode.
## 1) Goals
- Capture the *actual* process tree and exec lineage after the container starts.
- Identify steady-state processes (long-lived, listening, non-wrapper) even when supervision stacks are present.
- Feed the same reduction and runtime-classification pipeline as the static analyser.
## 2) Observational Exec Graph (OEG)
### 2.1 Data sources
- **Tracepoints / eBPF**: `sched_process_exec`, `sched_process_fork/clone`, and corresponding exit events give us pid, ppid, namespace, binary path, and argv snapshots with minimal overhead.
- **/proc sampling**: for each tracked PID, capture `/proc/<pid>/{exe,cmdline,cwd}` and file descriptors (especially listening sockets).
- **Namespace mapping**: normalise host PIDs to container PIDs (`NStgid`) so the graph is stable across runtimes.
### 2.2 Graph model
```csharp
public sealed record ExecNode(int HostPid, int NsPid, int Ppid, string Exe, string[] Argv, long StartTicks);
public sealed record ExecEdge(int ParentHostPid, int ChildHostPid, string Kind); // "clone" | "exec"
```
- Nodes represent `exec()` events (post-exec image) and contain the final argv.
- Edges labelled `clone` capture forks; `exec` edges show program replacements.
### 2.3 Steady-state candidate selection
For each node compute features:
| Feature | Rationale |
| --- | --- |
| Lifetime (until sampling end) | Long-lived processes are more likely to be the real workload. |
| Additional execs downstream | Zero execs after start implies terminal. |
| Listening sockets | Owning `LISTEN` sockets strongly suggests a server. |
| Wrapper catalogue hit | Mark nodes that match known shims (`tini`, `gosu`, `supervisord`, etc.). |
| Children fan-out | Supervisors spawn multiple children and remain parents. |
Feed these into a scoring function; retain TopK candidates (usually 13) along with evidence.
## 3) Integration with static pipeline
1. For each steady-state candidate, snapshot the command/argv and normalise via `ResolvedCommand` (as in static mode).
2. Run wrapper reduction and ShellFlow analysis if the candidate is a script.
3. Invoke runtime detectors to classify the binary.
4. Merge dynamic evidence with static evidence. Conflicts drop confidence or trigger the “supervisor” classification.
## 4) Supervisors & multi-service containers
Some images (e.g., `supervisord`, `s6`, `runit`) intentionally start multiple long-lived processes. Handle them as follows:
- Detect supervisor binaries from the wrapper catalogue.
- Analyse their configuration (`/etc/supervisord.conf`, `/etc/services.d/*`, etc.) to enumerate child services statically.
- Emit multiple `TerminalProcess` entries with individual confidence scores but mark the parent as `type = supervisor`.
## 5) Operational hints
- Sampling window: 13 seconds after start is usually sufficient; extend in debug mode.
- Overhead: prefer eBPF/tracepoints; fall back to periodic `/proc` walks when instrumentation isnt available.
- Security: honour namespace boundaries; never inspect processes outside the target containers cgroup/namespace.
- Failure mode: if dynamic capture fails, fall back to static mode and flag evidence accordingly (`"Dynamic capture unavailable"`).
## 6) Deliverables
The dynamic reducer returns an `EntryTraceResult` populated with:
- `ExecGraph` containing nodes and edges for audit/debug.
- `Terminals` listing steady-state processes (possibly multiple).
- `Evidence` strings referencing dynamic signals (`"pid 47 listening on 0.0.0.0:8080"`, `"wrapper tini collapsed into /usr/local/bin/python"`).
Downstream modules (Policy, Vuln Explorer, Export Center) treat the result identically to static scans, enabling easy comparison between build-time and runtime observations.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — C / C++
## Signals to gather
- Dynamically linked ELF (`.dynamic`) with GLIBC references (`GLIBC`, `GLIBCXX`, `libstdc++`).
- Presence of `/lib64/ld-linux-*.so.*` loaders.
- Absence of Go/Rust-specific markers.
- Native supervisor binaries (`nginx`, `envoy`, custom C services).
- Config files adjacent to the binary (`/etc/app.conf`, YAML/INI).
## Implementation notes
- Treat this detector as the "native fallback": confirm no higher-priority language matched.
- Collect shared library list to attach as evidence; highlight unusual dependencies.
- Inspect `EXPOSE` ports and config directories to aid classification.
- Normalise busybox-style symlinks (actual binary often `/bin/busybox` with applet name).
## Evidence & scoring
- Boost for ELF dynamic dependencies and loader presence.
- Add evidence for config files, service managers, or env variables.
- Penalise extremely small binaries without metadata (may be wrappers).
## Edge cases
- Static C binaries may look like Go; rely on build ID absence and library fingerprints.
- When binary is part of a supervisor stack (e.g., `s6-svscan`), delegate classification to `Supervisor`.
- Windows native services should be handled by PE analysis (`entrypoint-runtime-overview.md`).

View File

@@ -0,0 +1,22 @@
# Entry-Point Runtime — Deno
## Signals to gather
- `argv0` equals `deno` or path ends with `/bin/deno`.
- Arguments include `run`, `task`, `serve`, or `compile` outputs.
- Presence of `deno.json` / `deno.jsonc`, `import_map.json`, or cached modules (`/deno-dir`).
- Environment (`DENO_DIR`, `DENO_AUTH_TOKENS`).
## Implementation notes
- Resolve script URLs or local files; for remote sources record the URL as evidence.
- Distinguish between `deno compile` executables and the Deno runtime invoking a script.
- Recognise `deno task <name>` by reading tasks from `deno.json`.
- ShellFlow should already collapse Docker official entrypoint (`/usr/bin/env deno task start`).
## Evidence & scoring
- Boost for confirmed script/URL and config file presence.
- Add evidence for permissions flags (`--allow-net`, `--allow-env`) to aid policy decisions.
- Penalise when only the binary is present without scripts.
## Edge cases
- Deno deploy shims or adapters may further wrap the runtime; rely on wrapper catalogue.
- When `deno compile` emits a standalone binary, treat it as C/C++ unless metadata persists.

View File

@@ -0,0 +1,25 @@
# Entry-Point Runtime — .NET / C#
## Signals to gather
- Framework-dependent: `dotnet <app.dll>` invocation.
- Adjacent `*.runtimeconfig.json` (parse `tfm`, framework references, roll-forward).
- Self-contained or single-file apps: ELF/PE with `DOTNET_BUNDLE`, `System.Private.CoreLib`, or `coreclr` markers.
- ASP.NET hints: `ASPNETCORE_URLS`, `appsettings.json`, presence of `wwwroot`.
- Windows builds: PE with CLI header (managed assembly) or native host embedding a bundle.
## Implementation notes
- Resolve DLL paths relative to the working directory after env expansion.
- When `dotnet` is invoked without a DLL, treat as low-confidence and record evidence.
- For single-file executables, read the first few MB looking for bundle markers rather than full PE/ELF parsing.
- Capture runtimeconfig metadata when available; store TFM in `LanguageHit.MainModule`.
- Treat `dotnet exec` wrappers the same as `dotnet <dll>`.
## Evidence & scoring
- Large confidence boost when both host (`dotnet`) and DLL artefact are present.
- Add evidence for runtimeconfig parsing (`"runtimeconfig TFM=net8.0"`), bundle markers, or ASP.NET env vars.
- Penalise detections lacking artefact confirmation.
## Edge cases
- Native AOT (`dotnet publish -p:PublishAot=true`) emits native binaries without managed markers—should fall back to C/C++ detector.
- PowerShell-launched apps: ShellFlow should rewrite before the detector runs.
- Side-by-side deployment where multiple DLLs exist—prefer the one passed to `dotnet` or specified via `DOTNET_STARTUP_HOOKS`.

View File

@@ -0,0 +1,22 @@
# Entry-Point Runtime — Elixir / Erlang (BEAM)
## Signals to gather
- `argv0` equals `elixir`, `iex`, `mix`, `erl`, `beam.smp`, or release scripts (`bin/app start`).
- Release layouts: `_build/prod/rel/<app>/bin/<app>`, `releases/<version>/vm.args`, `sys.config`.
- Environment variables (`MIX_ENV`, `RELEASE_COOKIE`, `RELEASE_NODE`).
- Config files (`config/config.exs`, `config/prod.exs`).
## Implementation notes
- Recognise Distillery / mix release scripts that `exec` the real BEAM VM.
- When release script is invoked with `eval`, treat the wrapper as part of the chain but classify runtime as `Elixir`.
- Inspect `vm.args` for node name, cookie, and distributed settings.
- For pure Erlang services (no Elixir), the same detector should fire using `erl` hints.
## Evidence & scoring
- Boost for release directories and BEAM VM binaries (`beam.smp`).
- Add evidence for config files and env vars.
- Penalise minimal images lacking release artefacts (could be generic shell wrappers).
## Edge cases
- Phoenix apps often rely on `bin/server` wrapper—ShellFlow must collapse to release script.
- Multi-node clusters may start multiple BEAM instances; treat as `Supervisor` if several nodes stay active.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — Go
## Signals to gather
- Statically linked ELF with `.note.go.buildid`.
- `.gopclntab` section (function name table) or `Go build ID` strings.
- Minimal dynamic dependencies (often none) and musl/glibc loader differences.
- `GODEBUG`, `GOMAXPROCS`, `GOENV` environment variables.
- Go module artefacts: `go.mod`, `go.sum`.
## Implementation notes
- Use ELF parsing to locate `.note.go.buildid`; fallback to scanning the first few MB for `Go build ID`.
- Distinguish from Rust/C by checking `.dynsym` count, presence of Go-specific section names, and the absence of `GLIBCXX`.
- For distroless images, rely solely on ELF traits since no package metadata is present.
- Record binary path and module files as evidence.
## Evidence & scoring
- Strong boost for `.note.go.buildid` or `.gopclntab`.
- Add evidence for module files or env variables.
- Penalise binaries with high numbers of shared libraries (likely C/C++).
## Edge cases
- TinyGo or stripped binaries may lack build IDs—fall back to heuristics (symbol patterns, text section).
- CGO-enabled binaries include glibc dependencies; still treat as Go but mention CGO in evidence if detected.
- Supervisors wrapping Go services (e.g., `envoy`) should be handled upstream by wrapper detection.

View File

@@ -0,0 +1,29 @@
# Entry-Point Runtime — Java
## Signals to gather
- `argv0` equals `java` / `javaw` or resides under `*/bin/java`.
- `-jar <app.jar>` argument with the jar present in the VFS.
- Manifest metadata (`META-INF/MANIFEST.MF`) containing `Main-Class` or `Start-Class`.
- Spring Boot layout (`BOOT-INF/**`).
- Classpath form (`-cp/-classpath`) followed by a main class token.
- Presence of an embedded JRE (`lib/modules`, `jre/bin/java`).
- `JAVA_OPTS`, `JAVA_TOOL_OPTIONS`, or `JAVA_HOME` environment hints.
- `EXPOSE` ports often associated with Java servers (`8080`, `8443`).
## Implementation notes
- Expand env variables before resolving jar/class paths (supports `${VAR}`, `${VAR:-default}`).
- For classpath mode, open a subset of jars to corroborate `Main-Class`.
- Track when the app is started through shell wrappers (`exec java -jar "$APP_JAR"`); ShellFlow should already collapse these.
- Distinguish between installers (e.g., `java -version`) and actual app launches by checking for jar/class arguments.
- When multiple jars/classes are possible, prefer manifest-backed artefacts but record alternates in evidence.
## Evidence & scoring
- Reward concrete artefacts (jar exists, manifest resolved).
- Add evidence entries for each heuristic (`"MANIFEST Main-Class=com.example.Main"`, `"Spring Boot BOOT-INF detected"`).
- Penalise missing artefacts or ambiguous classpaths.
- Surface runtime-specific env/ports as supplementary clues, but keep their weight low to avoid false positives.
## Edge cases
- Launcher scripts that eventually run `java` — ensure ShellFlow surfaces the final command.
- Multi-module fat jars: only expose the main entry jar in evidence; keep supporting jars as context.
- Native image (`native-image` / GraalVM) should fall through to Go/Rust/C++ detectors when `java` binary is absent.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — Nginx
## Signals to gather
- `argv0` equals `nginx`.
- Config files: `/etc/nginx/nginx.conf`, `conf.d/*.conf`, `/usr/share/nginx/html`.
- Environment (`NGINX_ENTRYPOINT_QUIET_LOGS`, `NGINX_PORT`, `NGINX_ENVSUBST_TEMPLATE`).
- Listening sockets on 80/443 (dynamic mode) or `EXPOSE 80` (static).
- Modules or scripts shipped with the official Docker entrypoint (`docker-entrypoint.sh` collapsing to `nginx -g "daemon off;"`).
## Implementation notes
- Parse `nginx.conf` (basic directive traversal) to extract worker processes, include chains, upstream definitions.
- Handle official entrypoint idioms (`envsubst` templating) via ShellFlow.
- Distinguish pure reverse proxies from PHP-FPM combos; when both `nginx` and `php-fpm` run, classify container as `Supervisor`.
- Record static web content presence (`/usr/share/nginx/html/index.html`).
## Evidence & scoring
- Boost for confirmed config and workers.
- Add evidence for templating features, env substitution, or modules.
- Penalise if binary exists without config (likely not the entry point).
## Edge cases
- Alpine images may place configs under `/etc/nginx/conf.d`; include both.
- Custom builds might rename binary (`openresty`, `tengine`); consider aliases if common.
- Windows Nginx not supported; fall back to `Other`.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — Node.js
## Signals to gather
- `argv0` equals `node`, `nodejs`, or path ends with `/bin/node`.
- Scripts launched via package runners (`npm`, `yarn`, `pnpm node …`, `npx`).
- Presence of `package.json` with `"main"` or `"scripts":{"start":…}` entries.
- `NODE_ENV`, `NODE_OPTIONS`, or `NPM_PACKAGE_NAME` environment hints.
- Bundler/PM2 scenarios: `pm2-runtime`, `pm2-docker`, `forever`, `nodemon`.
## Implementation notes
- Resolve script arguments (e.g., `node server.js`) relative to the working dir.
- If invoked through `npm start`/`yarn run`, parse `package.json` to expand the actual script.
- Support TypeScript loaders (`ts-node`, `node --loader`, `.mjs`) by inspecting extensions and flags.
- Normalise shebang-based Node scripts (ShellFlow ensures `#!/usr/bin/env node` collapses to Node).
## Evidence & scoring
- Boost confidence when a concrete JS/TS entry file exists.
- Add evidence for `package.json` metadata, PM2 ecosystem files, or `NODE_ENV` values.
- Penalise when the entry file is missing or only package runners are present without scripts.
## Edge cases
- Multi-service supervisors (e.g., `pm2` managing multiple apps): treat as `Supervisor` and list programmes as children.
- Serverless shims (e.g., Google Functions) wrap Node; prefer the user-provided handler script if detectable.
- Distroless snapshots may omit package managers; rely on Node binary + script presence.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — PHP-FPM
## Signals to gather
- `argv0` equals `php-fpm` or `php-fpm8*` variants; master process often invoked with `-F` or `--nodaemonize`.
- Configuration files: `/usr/local/etc/php-fpm.conf`, `www.conf`, pool definitions under `php-fpm.d`.
- PHP runtime artefacts: `composer.json`, `public/index.php`, `artisan`, `wp-config.php`.
- Environment variables such as `PHP_FPM_CONFIG`, `PHP_INI_DIR`, `APP_ENV`.
- Socket or port exposure (`listen = 9000`, `/run/php-fpm.sock`).
## Implementation notes
- Verify master process vs worker processes (master stays PID 1, workers forked).
- Inspect pool configuration to extract listening endpoint and process manager mode.
- If `docker-php-entrypoint` is involved, ShellFlow must expand to `php-fpm`.
- Distinguish FPM from CLI invocations (`php script.php`) to avoid misclassification.
## Evidence & scoring
- Reward confirmed config files and listening sockets.
- Add evidence for application artefacts (Composer lockfile, framework directories).
- Penalise when only the binary is present without config (could be CLI usage).
## Edge cases
- Images bundling Apache/Nginx front-ends should end up as `Supervisor` with PHP-FPM as a child service.
- Some Alpine packages install `php-fpm7` naming—include aliases in detector.
- When `php-fpm` is launched via `s6` or supervisor, rely on child detection to avoid double counting.

View File

@@ -0,0 +1,25 @@
# Entry-Point Runtime — Python
## Signals to gather
- `argv0` equals `python`, `python3`, `pypy`, or an interpreter symlink.
- WSGI/ASGI servers: `gunicorn`, `uvicorn`, `hypercorn`, `daphne`.
- Task runners: `celery -A app worker`, `rq worker`, `pytest`.
- Presence of `requirements.txt`, `pyproject.toml`, `setup.cfg`, or `Pipfile`.
- `PYTHONPATH`, `PYTHONUNBUFFERED`, `DJANGO_SETTINGS_MODULE`, `FLASK_APP`, or application-specific env vars.
- Virtualenv detection (`/venv/bin/python`, `pyvenv.cfg`).
## Implementation notes
- When invoked as `python -m module`, resolve the module to a path if possible.
- For WSGI/ASGI servers, inspect command arguments (`app:app`, `module:create_app`) and config files.
- Recognise wrapper scripts such as `docker-entrypoint.py` that eventually `exec "$@"`.
- Support zipped apps or single-file bundles by checking `zipapp` signatures.
## Evidence & scoring
- Increase confidence when module or script exists and dependencies are present.
- Capture evidence for env variables, config files, or known server arguments.
- Penalise ambiguous invocations (e.g., `python -c "..."` without persistent service).
## Edge cases
- Supervisors launching multiple Python workers fall back to `Supervisor` classification with Python listed as child.
- Conda environments use different directory structures; look for `conda-meta` directories.
- Alpine distroless images may ship `python` symlinks without standard libs—ensure script presence before final classification.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — Ruby
## Signals to gather
- `argv0` equals `ruby`, `bundle`, `bundler`, `rackup`, `puma`, `unicorn`, `sidekiq`, or `resque`.
- Bundler scripts: `bundle exec <cmd>`; Gemfile and `Gemfile.lock`.
- Rails and Rack hints: `config.ru`, `bin/rails`, `bin/rake`.
- Background jobs: `sidekiq`, `delayed_job`, `resque`.
- Environment variables (`RAILS_ENV`, `RACK_ENV`, `BUNDLE_GEMFILE`).
## Implementation notes
- Normalise `bundle exec` by skipping the bundler wrapper and targeting the actual command.
- Resolve script paths relative to the working directory.
- For `puma`/`unicorn`, parse config files (`config/puma.rb`, `config/unicorn.rb`) to gather ports/workers.
- Recognise `foreman start` or `overmind` launching Procfile processes—may devolve to `Supervisor` classification.
## Evidence & scoring
- Boost confidence when `Gemfile.lock` exists and the requested server script is found.
- Add evidence for env variables and config files.
- Penalise ambiguous CLI invocations or missing artefacts.
## Edge cases
- Alpine distroless images may rely on `ruby` symlinks; confirm binary presence.
- JRuby (running on Java) may trigger both Ruby and Java signals—prefer Ruby if `ruby`/`jruby` interpreter is explicit.
- Supervisors launching multiple Ruby workers should produce a single `Supervisor` entry with Ruby children.

View File

@@ -0,0 +1,24 @@
# Entry-Point Runtime — Rust
## Signals to gather
- ELF binaries with DWARF producer strings containing `rustc`.
- Symbols prefixed with `_ZN` (mangled Rust) or section `.rustc`.
- Presence of `panic=abort` strings, `Rust` metadata, or Cargo artefacts (`Cargo.toml`, `Cargo.lock`).
- Statically linked (no `.dynamic` entries) in many cases, or musl loader (`/lib/ld-musl-x86_64.so.1`).
- Environment such as `RUST_LOG`, `RUST_BACKTRACE`.
## Implementation notes
- Parse DWARF `.debug_info` when available; short-circuit by scanning `.comment` sections for `rustc`.
- Distinguish from Go by the absence of `.note.go.buildid`.
- When Cargo artefacts exist, include target name and profile in evidence.
- For binaries built with `--target x86_64-pc-windows-gnu`, treat them under the same detector (PE + Rust markers).
## Evidence & scoring
- Reward DWARF producer strings, Cargo files, and Rust-specific env vars.
- Penalise when only generic static binary traits are present (may defer to C/C++).
- Mention musl vs glibc loader differences for observability.
## Edge cases
- Rust compiled to WebAssembly or run inside Wasmtime falls outside this detector; leave as `Other`.
- Stripped binaries without DWARF or comments may be indistinguishable from C—fall back to C/C++ and add note.
- Supervisors launching multiple Rust binaries handled upstream.

View File

@@ -0,0 +1,25 @@
# Entry-Point Runtime — Supervisors
Some containers intentionally launch multiple long-lived services (sidecars, appliance images, `supervisord`, `s6`, `runit`, `pm2`). Instead of forcing a single runtime classification, the detector can emit a `Supervisor` entry with child services enumerated separately.
## Signals to gather
- Known supervisor binaries: `supervisord`, `s6-svscan`, `s6-supervise`, `runsvdir`, `pm2-runtime`, `forego`, `foreman`, `overmind`.
- Configuration files: `/etc/supervisord.conf`, `/etc/s6/*.conf`, `Procfile`, `ecosystem.config.js`.
- Multiple child processes that remain active after startup.
- Environment variables controlling supervisor behaviour (`SUPERVISOR_*`, `PM2_HOME`, `S6_CMD_WAIT_FOR_SERVICES`).
## Implementation notes
- Keep the supervisor as the primary terminal but query configuration to list child commands.
- For each child, run the usual reduction + runtime detection and attach results as derived evidence.
- When configuration is templated (e.g., `envsubst`), evaluate ShellFlow output to resolve final commands.
- Record scheduling details (autorestart, process limits) relevant for incident response.
## Evidence & scoring
- Supervisor detection flips `LanguageType.Supervisor` with mid-level confidence (0.60.7).
- Confidence increases when configuration explicitly lists services and child processes are observed (dynamic mode).
- Provide evidence for each child service (`"manages: php-fpm on /run/php-fpm.sock"`, `"manages: nginx listening on 0.0.0.0:80"`).
## Edge cases
- Docker Compose-style images using `bash` to run multiple services should also map here if ShellFlow detects multiple `&` background jobs.
- Ensure we do not classify minimal init shims (`tini`, `dumb-init`) as supervisors—they should be collapsed.
- When supervisor manages only one child, collapse to the child runtime and drop the supervisor evidence to avoid noise.

View File

@@ -0,0 +1,94 @@
# Entry-Point Detection — Problem & Architecture
## 1) Why this exists
Container images rarely expose their *real* workload directly. Shell wrappers, init shims, supervisors, or language launchers often sit between the Dockerfile `ENTRYPOINT`/`CMD` values and the program you actually care about. StellaOps needs a deterministic, explainable way to map any container image (or running container) to a single logical entry point that downstream systems can reason about.
We define the target artefact as the tuple below:
```jsonc
{
"type": "java|dotnet|go|python|node|ruby|php-fpm|c/c++|rust|nginx|supervisor|other",
"resolvedBinary": "/app/app.jar | /app/app.dll | /app/server | /usr/local/bin/node",
"args": ["..."],
"confidence": 0.00..1.00,
"evidence": [
"why we believe this"
],
"chain": [
{"from": "/bin/sh -c", "to": "/entrypoint.sh", "why": "ENTRYPOINT shell-form"},
{"from": "/entrypoint.sh", "to": "java -jar orders.jar", "why": "exec \"$@\" with java default"}
]
}
```
Constraints:
- Static first: no `/proc`, no `ptrace`, no customer code execution when scanning images.
- Honour Docker/OCI precedence (`ENTRYPOINT` vs `CMD`, shell- vs exec-form, Windows `Shell` overrides).
- Work on distroless and multi-arch images as well as traditional distro bases.
- Emit auditable evidence and reduction chains so policy decisions are explainable.
## 2) Dual-mode architecture
The scanner exposes a single façade but routes to two reducers:
```
Scanner.EntryTrace/
Common/
OciImageReader.cs
OverlayVfs.cs
Heuristics/
Models/
Dynamic/ProcReducer.cs // running container
Static/ImageReducer.cs // static image inference
```
Selection logic:
```csharp
IEntryReducer reducer = container.IsRunning
? new ProcReducer()
: new ImageReducer();
var result = reducer.TraceAndReduce(ct);
```
Both reducers publish a harmonised `EntryTraceResult`, allowing downstream modules (Policy Engine, Vuln Explorer, Export Center) to consume the same shape regardless of data source.
## 3) Pipeline overview
### 3.1 Static images
1. Pull or load OCI image.
2. Compose final argv (`ENTRYPOINT ++ CMD`), respecting shell overrides.
3. Overlay layers with whiteout support via a lazy virtual filesystem.
4. Resolve paths, shebangs, wrappers, and scripts until a terminal candidate emerges.
5. Classify runtime family, identify application artefact, score confidence, and emit evidence.
### 3.2 Running containers
1. Capture real exec / fork events and build an exec graph.
2. Locate steady-state processes (long-lived, owns listeners, not a shim).
3. Collapse wrappers using the same catalogue as static mode.
4. Cross-check with static heuristics to tighten confidence.
### 3.3 Shared components
- **ShellFlow static analyser** handles script idioms (`set --`, `exec "$@"`, branch rewrites).
- **Wrapper catalogue** recognises shells, init shims, supervisors, and package runners.
- **Runtime detectors** plug in per language/framework (Java, .NET, Node, Python, PHP-FPM, Ruby, Go, Rust, Nginx, C/C++).
- **Score calibrator** turns detector raw scores into a unified 0..1 confidence.
## 4) Document map
The entry-point playbook is now split into focused guides:
| Document | Purpose |
| --- | --- |
| `entrypoint-static-analysis.md` | Overlay VFS, argv composition, wrapper reduction, scoring. |
| `entrypoint-dynamic-analysis.md` | Observational Exec Graph for running containers. |
| `entrypoint-shell-analysis.md` | ShellFlow static analyser and script idioms. |
| `entrypoint-runtime-overview.md` | Detector contracts, helper utilities, calibration, integrations. |
| `entrypoint-lang-*.md` | Runtime-specific heuristics (Java, .NET, Node, Python, PHP-FPM, Ruby, Go, Rust, C/C++, Nginx, Deno, Elixir/BEAM, Supervisor). |
Use this file as the landing page; each guide can be read independently when implementing or updating a specific component.

View File

@@ -0,0 +1,152 @@
# Runtime Detector Overview
Runtime classification converts a reduced command into a concrete language or framework identity with supporting evidence. This document describes the shared contracts, helper utilities, calibration strategy, and integration points; language-specific heuristics live in the `entrypoint-lang-*.md` files.
## 1) Contracts
```csharp
public enum LanguageType {
Java, DotNet, Node, Python, PhpFpm, Ruby, Go, Rust, CCpp,
Nginx, Deno, Elixir, Supervisor, Other
}
public sealed record ResolvedCommand(
string[] Argv,
string Argv0,
string? AbsolutePath,
bool IsElf,
bool IsPe,
bool IsScript,
string? Shebang,
string WorkingDir
);
public sealed record LanguageHit(
LanguageType Type,
double RawScore,
string ResolvedBinary,
string[] Args,
List<string> Evidence,
string? AppArtifactPath = null,
string? MainModule = null,
Dictionary<string,string>? Extra = null
);
```
### Interface
```csharp
public interface ILanguageSubDetector {
LanguageHit? TryDetect(
ResolvedCommand cmd, OverlayVfs vfs, EnvBag env, ImageContext img, CancellationToken ct = default);
}
public sealed class LanguageDetector {
private readonly ILanguageSubDetector[] _detectors = {
new JavaDetector(),
new DotNetDetector(),
new NodeDetector(),
new PythonDetector(),
new PhpFpmDetector(),
new RubyDetector(),
new NginxDetector(),
new GoDetector(),
new RustDetector(),
new DenoDetector(),
new ElixirDetector(),
new CCppDetector(),
new SupervisorDetector()
};
private readonly ScoreCalibrator _cal = ScoreCalibrator.Default;
public LanguageHit Detect(ResolvedCommand cmd, OverlayVfs vfs, EnvBag env, ImageContext img, out double confidence) {
var hits = _detectors.Select(d => d.TryDetect(cmd, vfs, env, img)).Where(h => h is not null).ToList()!;
LanguageHit best = hits.Count == 0
? new LanguageHit(LanguageType.Other, 0.10, cmd.AbsolutePath ?? cmd.Argv0, cmd.Argv.Skip(1).ToArray(),
new() { "No strong runtime family signals detected." })
: hits.OrderByDescending(_cal.Calibrate).First();
confidence = _cal.Calibrate(best);
foreach (var alt in hits.Where(h => h != best).OrderByDescending(_cal.Calibrate))
best.Evidence.Add($"Alternative: {alt!.Type} (score={_cal.Calibrate(alt):0.00}) — {string.Join("; ", alt.Evidence.Take(2))}…");
return best;
}
}
```
## 2) Helpers
```csharp
static class VfsHelpers {
public static bool FileExists(OverlayVfs vfs, string path) => vfs.Exists(path);
public static bool TryOpen(OverlayVfs vfs, string path, out Stream? stream) {
if (!vfs.Exists(path)) { stream = null; return false; }
stream = vfs.OpenRead(path);
return true;
}
public static string Join(string cwd, string maybeRel) =>
Path.IsPathRooted(maybeRel) ? maybeRel : Path.GetFullPath(Path.Combine(cwd, maybeRel));
}
static class ArgvHelpers {
public static int IndexOf(this string[] argv, string flag) =>
Array.FindIndex(argv, a => a == flag);
public static string? Next(this string[] argv, int idx) =>
(idx >= 0 && idx + 1 < argv.Length) ? argv[idx + 1] : null;
public static bool AnyEndsWith(this IEnumerable<string> args, string suffix, bool ignoreCase = true) =>
args.Any(a => a.EndsWith(suffix, ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal));
public static bool Is(this string? candidate, params string[] names) =>
candidate is not null && names.Any(n => string.Equals(Path.GetFileName(candidate), n, StringComparison.OrdinalIgnoreCase));
}
```
## 3) Scoring & calibration
- Each sub-detector returns a `RawScore` (0..1) based on family-specific heuristics.
- Feed raw scores into a calibrator (Platt scaling or isotonic regression) trained on labelled corpora to get calibrated probabilities.
- Persist calibration metadata per detector to avoid drift.
- When no detector fires, return `LanguageType.Other` with low confidence and an evidence note.
## 4) Cross-checks
Enhance precision by combining detector results with filesystem and configuration signals:
- Compare declared `EXPOSE` ports with runtime defaults (e.g., `80/443` for Nginx, `8080` for Java app servers).
- Inspect service-specific configuration (`nginx.conf`, `php-fpm.conf`, `web.config`, `Gemfile`, `package.json`, `pyproject.toml`).
- For Java and .NET, verify artefact presence and manifest metadata; for Go/Rust check static binary traits.
- Re-run detectors after ShellFlow rewrites to ensure post-`exec` commands are analysed.
## 5) Windows nuances
- Use `config.Shell` to detect PowerShell vs CMD; adjust interpreter lookup accordingly.
- PE probing is mandatory—PowerShell scripts often front .NET or native binaries.
- Consider case-insensitive paths and `\` separators.
## 6) Integration points
- Static reducer passes `ResolvedCommand` → runtime detector.
- Dynamic reducer pipes steady-state commands through the same interface.
- Output `LanguageHit` populates the `TerminalProcess` along with `confidence`.
- Downstream consumers (Policy Engine, Vuln Explorer) merge runtime type into their evidence trail.
## 7) Next steps
Language-specific heuristics live in:
| Runtime | Document |
| --- | --- |
| Java | `entrypoint-lang-java.md` |
| .NET / C# | `entrypoint-lang-dotnet.md` |
| Node.js | `entrypoint-lang-node.md` |
| Python | `entrypoint-lang-python.md` |
| PHP-FPM | `entrypoint-lang-phpfpm.md` |
| Ruby | `entrypoint-lang-ruby.md` |
| Go | `entrypoint-lang-go.md` |
| Rust | `entrypoint-lang-rust.md` |
| C/C++ | `entrypoint-lang-ccpp.md` |
| Nginx | `entrypoint-lang-nginx.md` |
| Deno | `entrypoint-lang-deno.md` |
| Elixir/Erlang (BEAM) | `entrypoint-lang-elixir.md` |
| Supervisors | `entrypoint-lang-supervisor.md` |
Each runtime file documents the heuristics, artefacts, and edge cases specific to that family.

View File

@@ -0,0 +1,83 @@
# ShellFlow — Script Reduction Playbook
Most container entry points eventually execute a shell script. The ShellFlow analyser resolves these scripts without executing user code, providing deterministic, explainable reductions.
## 1) Scope
- POSIX `sh` subset with common Bash extensions (control flow, functions, parameter expansion).
- Handles idioms from official Docker images (`if [ "$1" = "server" ]; then …`, `exec gosu "$USER" "$@"`, `set -- java -jar …`).
- Tracks positional parameters (`$@`, `$1..$9`), environment variables, and `set --` mutations.
- Produces one or more candidate commands with supporting evidence.
## 2) Architecture
```
ShellFlow/
Parser/ // POSIX sh lexer + recursive descent parser
Ast/ // nodes for lists, pipelines, conditionals, functions
Evaluator/ // partial evaluation & taint tracking
Idioms/ // pattern library for common Docker entrypoints
Planner/ // emits CommandPlan[]
```
### 2.1 CommandPlan
```csharp
public sealed record CommandPlan(
string[] Argv,
double HeuristicScore,
IReadOnlyList<string> Evidence,
IReadOnlyList<ReductionEdge> Chain,
bool IsFallback = false
);
```
Plans feed directly into the static reducer, which selects the highest-confidence plan but keeps alternates as evidence.
## 3) Parsing & AST
- Tokenise words, assignments, pipelines (`|`), lists (`;`, `&&`, `||`), conditionals (`if`, `case`), loops (`for`, `while`, `until`), functions, and redirections.
- Preserve heredocs and subshells as opaque nodes (evaluated conservatively).
- Record source spans to surface meaningful evidence (`"line 12: exec java -jar $APP_JAR"`).
## 4) Partial evaluation
- Initialise symbol table from image environment plus caller-supplied args.
- Treat `$@`, `$*`, `$1..$9` as tainted; propagate taint through assignments.
- Resolve `${VAR:-default}` and `${VAR:+alt}` when `VAR` known; otherwise branch.
- Support `set -- …` (resets positional parameters) and `shift`.
- `source`/`.` commands are parsed recursively when files are available; else fallback to low-confidence branch.
## 5) Exec sink detection
- `exec <cmd>` dominates the remainder of the script.
- Chains such as `exec gosu "$USER" "$@"` feed into wrapper collapse.
- When no `exec` is present, pick the last reachable simple command in the main path.
- Multi-branch scripts yield multiple plans with adjusted scores; unresolved branches are marked `IsFallback`.
## 6) Idiom library
| Pattern | Action |
| --- | --- |
| `if [ "${1:0:1}" = '-' ]; then set -- server "$@"; fi` | Rewrite argv to prepend default command. |
| `if [ "$1" = "bash" ]; then exec "$@"; fi` | Pass-through for manual shells. |
| `exec "$@"` + non-empty CMD | Substitute CMD vector into plan. |
| `exec java -jar "$APP_JAR" "$@"` | Resolve JAR via env or filesystem. |
| `set -- gosu "$APP_USER" "$@"` | Collapse into wrapper plan. |
Idioms are implemented as AST visitors; each adds evidence strings when triggered.
## 7) Confidence scoring
- Base score from plan heuristics (`HeuristicScore`).
- Penalties for unresolved taint (`$@` unknown), missing files, nested subshells, or fallbacks.
- Bonus when idioms match, artefacts exist, or env values resolve cleanly.
- Final confidence is combined with the outer static scoring model.
## 8) Failure modes
- Missing script (`ENTRYPOINT` points to deleted file): emit fallback plan with low confidence.
- Self-modifying scripts or heavy dynamic features (`eval`, backticks): mark plan as low-confidence and surface warning evidence.
- Commands that spawn supervisors without `exec`: return both the supervisor and inferred children (if configuration files are present).
ShellFlow keeps the static reducer explainable: every inferred command is accompanied by the script span and reasoning used to reach it.

View File

@@ -0,0 +1,122 @@
# Entry-Point Static Analysis
This guide captures the static half of StellaOps entry-point detection pipeline: how we turn image metadata and filesystem contents into a resolved binary, an execution chain, and a confidence score.
## 1) Loading OCI images
### 1.1 Supported inputs
- Registry references (`repo:tag@sha256:digest`) using the existing content store.
- Local OCI/Docker v2 archives (`docker save` tarball, OCI layout directory with `index.json` + `blobs/sha256/*`).
### 1.2 Normalised model
```csharp
sealed class OciImage {
public required string Os;
public required string Arch;
public required string[] Entrypoint;
public required string[] Cmd;
public required string[] Shell; // Windows / powershell overrides
public required string WorkingDir;
public required string[] Env;
public required string[] ExposedPorts;
public required LabelMap Labels;
public required LayerRef[] Layers; // ordered, compressed blobs
}
```
Compose the runtime argv as `Entrypoint ++ Cmd`, honouring shell-form vs exec-form (see §2.3).
## 2) Overlay virtual filesystem
### 2.1 Whiteouts
- Regular whiteout: `path/.wh.<name>` removes `<name>` from lower layers.
- Opaque directory: `path/.wh..wh..opq` hides the directory entirely.
### 2.2 Lazy extraction
- First pass: build a tar index `(path → layer, offset, size, mode, isWhiteout, isDir)`.
- Decompress only when reading a file; optionally support eStargz TOC to accelerate random access.
### 2.3 Shell-form composition
- Dockerfile shell form is serialised as `["/bin/sh","-c","…"]` (or `Shell[]` override on Windows).
- Always trust `config.json`; no need to inspect the Dockerfile.
- Working directory defaults to `/` if unspecified.
## 3) Low-level primitives
### 3.1 PATH resolution
- Extract `PATH` from environment (fallback `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin`).
- If `argv[0]` is relative or lacks `/`, walk the PATH to resolve an absolute file.
- Verify execute bit (or Windows ACL) before accepting.
### 3.2 Shebang handling
- For non-ELF/PE files: read first line; interpret `#!interpreter args`.
- Replace `argv[0]` with the interpreter, prepend shebang args, append script path per kernel semantics.
### 3.3 Binary probes
- Identify ELF via magic `\x7FELF`, parse `.interp`, `.dynamic`, linked libs, `.note.go.buildid`, DWARF producer.
- Identify PE (Windows) and detect .NET single-file bundles via CLI header.
- Record features for runtime scoring (Go vs Rust vs glibc vs musl).
## 4) Wrapper catalogue
Collapse known wrappers before analysing the target command:
- Init shims: `tini`, `dumb-init`, `s6-svscan`, `runit`, `supervisord`.
- Privilege droppers: `gosu`, `su-exec`, `chpst`.
- Shells: `sh`, `bash`, `dash`, BusyBox variants.
- Package runners: `npm`, `yarn`, `pnpm`, `pip`, `pipenv`, `poetry`, `bundle`, `rake`.
Rules:
- If wrapper contains a `--` sentinel (`tini -- app …`) drop the wrapper and record a reduction edge.
- `gosu user cmd …` → collapse to `cmd …`.
- For shell wrappers, delegate to the ShellFlow analyser (see separate guide).
## 5) ShellFlow integration
When the resolved command is a shell script, invoke the ShellFlow analyser to locate the eventual `exec` target. Key capabilities:
- Parses POSIX sh (and common Bash extensions).
- Tracks environment mutations (`set`, `export`, `set --`).
- Resolves `$@`, `$1..9`, `${VAR:-default}`.
- Recognises idioms from official Docker images (`if [ "$1" = "server" ]; then …`).
- Emits multiple branches when predicates depend on unknown data, but tags them with lower confidence.
The analyser returns one or more candidate commands along with reasons, which feed into the reduction engine.
## 6) Reduction algorithm
1. Compose argv `ENTRYPOINT ++ CMD`.
2. Collapse wrappers; append `ReductionEdge` entries documenting each step.
3. Resolve argv0 to an absolute file and classify (ELF/PE/script).
4. If script → run ShellFlow; replace current command with highest-confidence `exec` target while preserving alternates as evidence.
5. Attempt to resolve application artefacts for VM hosts (JARs, DLLs, JS entry, Python module, etc.).
6. Emit `EntryTraceResult` with candidate terminals ranked by confidence.
## 7) Confidence scoring
Use a simple logistic model with feature contributions captured for the evidence trail. Example features:
| Id | Signal | Weight |
| --- | --- | --- |
| `f1` | Entrypoint already an executable (ELF/PE) | +0.18 |
| `f2` | Observed chain ends in non-wrapper binary | +0.22 |
| `f3` | VM host + resolvable artefact | +0.20 |
| `f4` | Exposed ports align with runtime | +0.06 |
| `f5` | Shebang interpreter matches runtime family | +0.05 |
| `f6` | Language artefact validation succeeded | +0.15 |
| `f8` | Multi-branch script unresolved (`$@` taint) | 0.20 |
| `f9` | Target missing execute bit | 0.25 |
| `f10` | Shell with no `exec` | 0.18 |
Persist per-feature evidence strings so UI/CLI users can see **why** the scanner picked a given entry point.
## 8) Outputs
Return a populated `EntryTraceResult`:
- `Terminals` contains the best candidate(s) and their runtime classification.
- `Evidence` aggregates feature messages, ShellFlow reasoning, wrapper reductions, and runtime detector hints.
- `Chain` shows the explainable path from initial Docker argv to the final binary.
Static and dynamic reducers share this shape, enabling downstream modules to remain agnostic of the detection mode.

View File

@@ -0,0 +1,26 @@
# Entry-Point Documentation Index
The entry-point detection system is now split into focused guides. Use this index to navigate the individual topics.
| Topic | Document |
| --- | --- |
| Problem statement & architecture overview | `entrypoint-problem.md` |
| Static resolver (OCI layers, wrappers, scoring) | `entrypoint-static-analysis.md` |
| Dynamic resolver / Observational Exec Graph | `entrypoint-dynamic-analysis.md` |
| ShellFlow script analysis | `entrypoint-shell-analysis.md` |
| Runtime detector contracts & calibration | `entrypoint-runtime-overview.md` |
| Java heuristics | `entrypoint-lang-java.md` |
| .NET heuristics | `entrypoint-lang-dotnet.md` |
| Node.js heuristics | `entrypoint-lang-node.md` |
| Python heuristics | `entrypoint-lang-python.md` |
| PHP-FPM heuristics | `entrypoint-lang-phpfpm.md` |
| Ruby heuristics | `entrypoint-lang-ruby.md` |
| Go heuristics | `entrypoint-lang-go.md` |
| Rust heuristics | `entrypoint-lang-rust.md` |
| C/C++ heuristics | `entrypoint-lang-ccpp.md` |
| Nginx heuristics | `entrypoint-lang-nginx.md` |
| Deno heuristics | `entrypoint-lang-deno.md` |
| Elixir / Erlang (BEAM) heuristics | `entrypoint-lang-elixir.md` |
| Supervisor classification | `entrypoint-lang-supervisor.md` |
> Looking for historical context? The unified write-up previously in `entrypoint2.md` and `entrypoint-lang-detection.md` has been decomposed into the files above for easier maintenance.

View File

@@ -0,0 +1,88 @@
# Scanner Artifact Store Migration (MinIO → RustFS)
## Overview
Sprint 11 introduces **RustFS** as the default artifact store for the Scanner plane. Existing
deployments running MinIO (or any S3-compatible backend) must migrate stored SBOM artefacts to RustFS
before switching the Scanner hosts to `scanner.artifactStore.driver = "rustfs"`.
This runbook covers the recommended migration workflow and validation steps.
## Prerequisites
- RustFS service deployed and reachable from the Scanner control plane (`http(s)://rustfs:8080`).
- Existing MinIO/S3 credentials with read access to the current bucket.
- CLI environment with the StellaOps source tree (for the migration tool) and `dotnet 10` SDK.
- Maintenance window sized to copy all artefacts (migration is read-only on the source bucket).
## 1. Snapshot source bucket (optional but recommended)
If the MinIO deployment offers versioning or snapshots, take one before migrating. For non-versioned
deployments, capture an external backup (e.g., `mc mirror` to offline storage).
## 2. Dry-run the migrator
```
dotnet run --project src/Tools/RustFsMigrator -- \
--s3-bucket scanner-artifacts \
--s3-endpoint http://stellaops-minio:9000 \
--s3-access-key stellaops \
--s3-secret-key dev-minio-secret \
--rustfs-endpoint http://stellaops-rustfs:8080 \
--rustfs-bucket scanner-artifacts \
--prefix scanner/ \
--dry-run
```
The dry-run enumerates keys and reports the object count without writing to RustFS. Use this to
estimate migration time.
## 3. Execute migration
Remove the `--dry-run` flag to copy data. Optional flags:
- `--immutable` mark all migrated objects as immutable (`X-RustFS-Immutable`).
- `--retain-days 365` request retention (in days) via `X-RustFS-Retain-Seconds`.
- `--rustfs-api-key-header` / `--rustfs-api-key` provide auth headers when RustFS is protected.
The tool streams each object from S3 and performs an idempotent `PUT` to RustFS preserving the key
structure (e.g., `scanner/layers/<sha256>/sbom.cdx.json.zst`).
## 4. Verify sample objects
Pick a handful of SBOM digests and confirm:
1. `GET /api/v1/buckets/<bucket>/objects/<key>` returns the expected payload (size + SHA-256).
2. Scanner WebService configured with `scanner.artifactStore.driver = "rustfs"` can fetch the same
artefacts (Smoke test: `GET /api/v1/scanner/sboms/<digest>?format=cdx-json`).
## 5. Switch Scanner hosts
Update configuration (Helm/Compose/environment) to set:
```
scanner:
artifactStore:
driver: rustfs
endpoint: http://stellaops-rustfs:8080
bucket: scanner-artifacts
timeoutSeconds: 30
```
Redeploy Scanner WebService and Worker. Monitor logs for `RustFS` upload/download messages and
Prometheus scrape (`rustfs_requests_total`).
## 6. Cleanup legacy MinIO (optional)
After a complete migration and validation period, decommission the MinIO bucket or repurpose it for
other components (Concelier still supports S3). Ensure backups reference RustFS snapshots going
forward.
## Troubleshooting
- **Uploads fail (HTTP 4xx/5xx):** Check RustFS logs and confirm API key headers. Re-run the migrator
for the affected keys.
- **Missing objects post-cutover:** Re-run the migrator with the specific `--prefix`. The tool is
idempotent and safely overwrites existing objects.
- **Performance tuning:** Run multiple instances of the migrator with disjoint prefixes if needed; the
RustFS API is stateless and supports parallel PUTs.