- Added Program.cs to set up the web application with Serilog for logging, health check endpoints, and a placeholder admission endpoint. - Configured Kestrel server to use TLS 1.3 and handle client certificates appropriately. - Created StellaOps.Zastava.Webhook.csproj with necessary dependencies including Serilog and Polly. - Documented tasks in TASKS.md for the Zastava Webhook project, outlining current work and exit criteria for each task.
23 KiB
component_architecture_scanner.md — Stella Ops Scanner (2025Q4)
Scope. Implementation‑ready architecture for the Scanner subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (MinIO+Mongo), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
0) Mission & boundaries
Mission. Produce deterministic, explainable SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: Inventory (everything present) and Usage (entrypoint closure + actually linked libs). Attach attestations through Signer→Attestor→Rekor v2.
Boundaries.
- Scanner does not produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
- Scanner does not keep third‑party SBOM warehouses. It may bind to existing attestations for exact hashes.
- Core analyzers are deterministic (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
1) Solution & project layout
src/
├─ StellaOps.Scanner.WebService/ # REST control plane, catalog, diff, exports
├─ StellaOps.Scanner.Worker/ # queue consumer; executes analyzers
├─ StellaOps.Scanner.Models/ # DTOs, evidence, graph nodes, CDX/SPDX adapters
├─ StellaOps.Scanner.Storage/ # Mongo repositories; MinIO object client; ILM/GC
├─ StellaOps.Scanner.Queue/ # queue abstraction (Redis/NATS/RabbitMQ)
├─ StellaOps.Scanner.Cache/ # layer cache; file CAS; bloom/bitmap indexes
├─ StellaOps.Scanner.EntryTrace/ # ENTRYPOINT/CMD → terminal program resolver (shell AST)
├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Python|Go|DotNet|Rust]/
├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/ # PE/Mach-O planned (M2)
├─ StellaOps.Scanner.Emit.CDX/ # CycloneDX (JSON + Protobuf)
├─ StellaOps.Scanner.Emit.SPDX/ # SPDX 3.0.1 JSON
├─ StellaOps.Scanner.Diff/ # image→layer→component three‑way diff
├─ StellaOps.Scanner.Index/ # BOM‑Index sidecar (purls + roaring bitmaps)
├─ StellaOps.Scanner.Tests.* # unit/integration/e2e fixtures
└─ tools/
├─ StellaOps.Scanner.Sbomer.BuildXPlugin/ # BuildKit generator (image referrer SBOMs)
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
Analyzer assemblies and buildx generators are packaged as restart-time plug-ins under plugins/scanner/** with manifests; services must restart to activate new plug-ins.
1.1 Queue backbone (Redis / NATS)
StellaOps.Scanner.Queue exposes a transport-agnostic contract (IScanQueue/IScanQueueLease) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:
- Redis Streams (default). Uses consumer groups, deterministic idempotency keys (
scanner:jobs:idemp:*), and supports lease claim (XCLAIM), renewal, exponential-backoff retries, and ascanner:jobs:deadstream for exhausted attempts. - NATS JetStream. Provisions the
SCANNER_JOBSwork-queue stream + durable consumerscanner-workers, publishes withMsgIdfor dedupe, applies backoff viaNAKdelays, and routes dead-lettered jobs toSCANNER_JOBS_DEAD.
Metrics are emitted via Meter counters (scanner_queue_enqueued_total, scanner_queue_retry_total, scanner_queue_deadletter_total), and ScannerQueueHealthCheck pings the active backend (Redis PING, NATS PING). Configuration is bound from scanner.queue:
scanner:
queue:
kind: redis # or nats
redis:
connectionString: "redis://queue:6379/0"
streamName: "scanner:jobs"
nats:
url: "nats://queue:4222"
stream: "SCANNER_JOBS"
subject: "scanner.jobs"
durableConsumer: "scanner-workers"
deadLetterSubject: "scanner.jobs.dead"
maxDeliveryAttempts: 5
retryInitialBackoff: 00:00:05
retryMaxBackoff: 00:02:00
The DI extension (AddScannerQueue) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.
Runtime form‑factor: two deployables
- Scanner.WebService (stateless REST)
- Scanner.Worker (N replicas; queue‑driven)
2) External dependencies
- OCI registry with Referrers API (discover attached SBOMs/signatures).
- MinIO (S3‑compatible) for SBOM artifacts; Object Lock for immutable classes; ILM for TTL.
- MongoDB for catalog, job state, diffs, ILM rules.
- Queue (Redis Streams/NATS/RabbitMQ).
- Authority (on‑prem OIDC) for OpToks (DPoP/mTLS).
- Signer + Attestor (+ Fulcio/KMS + Rekor v2) for DSSE + transparency.
3) Contracts & data model
3.1 Evidence‑first component model
Nodes
Image,Layer,FileComponent(purl?,name,version?,type,id— may bebin:{sha256})Executable(ELF/PE/Mach‑O),Library(native or managed),EntryScript(shell/launcher)
Edges (all carry Evidence)
contains(Image|Layer → File)installs(PackageDB → Component)(OS database row)declares(InstalledMetadata → Component)(dist‑info, pom.properties, deps.json…)links_to(Executable → Library)(ELFDT_NEEDED, PE imports)calls(EntryScript → Program)(file:line from shell AST)attests(Rekor → Component|Image)(SBOM/predicate binding)bound_from_attestation(Component_attested → Component_observed)(hash equality proof)
Evidence
{ source: enum, locator: (path|offset|line), sha256?, method: enum, timestamp }
No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.
3.2 Catalog schema (Mongo)
-
artifacts{ _id, type: layer-bom|image-bom|diff|index, format: cdx-json|cdx-pb|spdx-json, bytesSha256, size, rekor: { uuid,index,url }?, ttlClass, immutable, refCount, createdAt } -
images { imageDigest, repo, tag?, arch, createdAt, lastSeen } -
layers { layerDigest, mediaType, size, createdAt, lastSeen } -
links { fromType, fromDigest, artifactId }// image/layer -> artifact -
jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error } -
lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }
3.3 Object store layout (MinIO)
layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb # CycloneDX Protobuf
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin # purls + roaring bitmaps
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json # DSSE bundle (cert chain + Rekor proof)
4) REST API (Scanner.WebService)
All under /api/v1/scanner. Auth: OpTok (DPoP/mTLS); RBAC scopes.
POST /scans { imageRef|digest, force?:bool } → { scanId }
GET /scans/{id} → { status, imageDigest, artifacts[], rekor? }
GET /sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
GET /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
POST /exports { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
POST /reports { imageDigest, policyRevision? } → { reportId, rekor? } # delegates to backend policy+vex
GET /catalog/artifacts/{id} → { meta }
GET /healthz | /readyz | /metrics
Report events
When scanner.events.enabled = true, the WebService serialises the signed report (canonical JSON + DSSE envelope) with NotifyCanonicalJsonSerializer and publishes two Redis Stream entries (scanner.report.ready, scanner.scan.completed) to the configured stream (default stella.events). The stream fields carry the whole envelope plus lightweight headers (kind, tenant, ts) so Notify and UI timelines can consume the event bus without recomputing signatures. Publish timeouts and bounded stream length are controlled via scanner:events:publishTimeoutSeconds and scanner:events:maxStreamLength. If the queue driver is already Redis and no explicit events DSN is provided, the host reuses the queue connection and auto-enables event emission so deployments get live envelopes without extra wiring.
5) Execution flow (Worker)
5.1 Acquire & verify
- Resolve image (prefer
repo@sha256:…). - (Optional) verify image signature per policy (cosign).
- Pull blobs, compute layer digests; record metadata.
5.2 Layer union FS
- Apply whiteouts; materialize final filesystem; map file → first introducing layer.
- Windows layers (MSI/SxS/GAC) planned in M2.
5.3 Evidence harvest (parallel analyzers; deterministic only)
A) OS packages
- apk:
/lib/apk/db/installed - dpkg:
/var/lib/dpkg/status,/var/lib/dpkg/info/*.list - rpm:
/var/lib/rpm/Packages(via librpm or parser) - Record
name,version(epoch/revision),arch, source package where present, and declared file lists.
Data flow note: Each OS analyzer now writes its canonical output into the shared
ScanAnalysisStoreunderanalysis.os.packages(raw results),analysis.os.fragments(per-analyzer layer fragments), and contributes toanalysis.layers.fragments(the aggregated view consumed by emit/diff pipelines). Helpers inScanAnalysisCompositionBuilderconvert these fragments into SBOM composition requests and component graphs so the diff/emit stages no longer reach back into individual analyzer implementations.
B) Language ecosystems (installed state only)
- Java:
META-INF/maven/*/pom.properties, MANIFEST →pkg:maven/... - Node:
node_modules/**/package.json→pkg:npm/... - Python:
*.dist-info/{METADATA,RECORD}→pkg:pypi/... - Go: Go buildinfo in binaries →
pkg:golang/... - .NET:
*.deps.json+ assembly metadata →pkg:nuget/... - Rust: crates only when explicitly present (embedded metadata or cargo/registry traces); otherwise binaries reported as
bin:{sha256}.
Rule: We only report components proven on disk with authoritative metadata. Lockfiles are evidence only.
C) Native link graph
- ELF: parse
PT_INTERP,DT_NEEDED, RPATH/RUNPATH, GNU symbol versions; map SONAMEs to file paths; link executables → libs. - PE/Mach‑O (planned M2): import table, delay‑imports; version resources; code signatures.
- Map libs back to OS packages if possible (via file lists); else emit
bin:{sha256}components. - The exported metadata (
stellaops.os.*properties, license list, source package) feeds policy scoring and export pipelines directly – Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into downstream JSON/Trivy payloads.
D) EntryTrace (ENTRYPOINT/CMD → terminal program)
- Read image config; parse shell (POSIX/Bash subset) with AST:
source/.includes;case/if;exec/command;run‑parts. - Resolve commands via PATH within the built rootfs; follow language launchers (Java/Node/Python) to identify the terminal program (ELF/JAR/venv script).
- Record file:line and choices for each hop; output chain graph.
- Unresolvable dynamic constructs are recorded as unknown edges with reasons (e.g.,
$FOOunresolved).
E) Attestation & SBOM bind (optional)
- For each file hash or binary hash, query local cache of Rekor v2 indices; if an SBOM attestation is found for exact hash, bind it to the component (origin=
attested). - For the image digest, likewise bind SBOM attestations (build‑time referrers).
5.4 Component normalization (exact only)
- Create
Componentnodes only with deterministic identities: purl, orbin:{sha256}for unlabeled binaries. - Record origin (OS DB, installed metadata, linker, attestation).
5.5 SBOM assembly & emit
- Per‑layer SBOM fragments: components introduced by the layer (+ relationships).
- Image SBOMs: merge fragments; refer back to them via CycloneDX BOM‑Link (or SPDX ExternalRef).
- Emit both Inventory & Usage views.
- Serialize CycloneDX JSON and CycloneDX Protobuf; optionally SPDX 3.0.1 JSON.
- Build BOM‑Index sidecar: purl table + roaring bitmap; flag
usedByEntrypointcomponents for fast backend joins.
5.6 DSSE attestation (via Signer/Attestor)
- WebService constructs predicate with
image_digest,stellaops_version,license_id,policy_digest?(when emitting final reports), timestamps. - Calls Signer (requires OpTok + PoE); Signer verifies entitlement + scanner image integrity and returns DSSE bundle.
- Attestor logs to Rekor v2; returns
{uuid,index,proof}→ stored inartifacts.rekor.
6) Three‑way diff (image → layer → component)
6.1 Keys & classification
- Component key: purl when present; else
bin:{sha256}. - Diff classes:
added,removed,version_changed(upgraded|downgraded),metadata_changed(e.g., origin from attestation vs observed). - Layer attribution: for each change, resolve the introducing/removing layer.
6.2 Algorithm (outline)
A = components(imageOld, key)
B = components(imageNew, key)
added = B \ A
removed = A \ B
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }
for each item in added/removed/changed:
layer = attribute_to_layer(item, imageOld|imageNew)
usageFlag = usedByEntrypoint(item, imageNew)
emit diff.json (grouped by layer with badges)
Diffs are stored as artifacts and feed UI and CLI.
7) Build‑time SBOMs (fast CI path)
Scanner.Sbomer.BuildXPlugin can act as a BuildKit generator:
- During
docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, run analyzers on the build context/output; attach SBOMs as OCI referrers to the built image. - Optionally request Signer/Attestor to produce Stella Ops‑verified attestation immediately; else, Scanner.WebService can verify and re‑attest post‑push.
- Scanner.WebService trusts build‑time SBOMs per policy, enabling no‑rescan for unchanged bases.
8) Configuration (YAML)
scanner:
queue:
kind: redis
url: "redis://queue:6379/0"
mongo:
uri: "mongodb://mongo/scanner"
s3:
endpoint: "http://minio:9000"
bucket: "stellaops"
objectLock: "governance" # or 'compliance'
analyzers:
os: { apk: true, dpkg: true, rpm: true }
lang: { java: true, node: true, python: true, go: true, dotnet: true, rust: true }
native: { elf: true, pe: false, macho: false } # PE/Mach-O in M2
entryTrace: { enabled: true, shellMaxDepth: 64, followRunParts: true }
emit:
cdx: { json: true, protobuf: true }
spdx: { json: true }
compress: "zstd"
rekor:
url: "https://rekor-v2.internal"
signer:
url: "https://signer.internal"
limits:
maxParallel: 8
perRegistryConcurrency: 2
policyHints:
verifyImageSignature: false
trustBuildTimeSboms: true
9) Scale & performance
-
Parallelism: per‑analyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.
-
Distributed locks per layer digest to prevent duplicate work across Workers.
-
Registry throttles: per‑host concurrency budgets; exponential backoff on 429/5xx.
-
Targets:
- Build‑time: P95 ≤ 3–5 s on warmed bases (CI generator).
- Post‑build delta: P95 ≤ 10 s for 200 MB images with cache hit.
- Emit: CycloneDX Protobuf ≤ 150 ms for 5k components; JSON ≤ 500 ms.
- Diff: ≤ 200 ms for 5k vs 5k components.
10) Security posture
- AuthN: Authority‑issued short OpToks (DPoP/mTLS).
- AuthZ: scopes (
scanner.scan,scanner.export,scanner.catalog.read). - mTLS to Signer/Attestor; only Signer can sign.
- No network fetches during analysis (except registry pulls and optional Rekor index reads).
- Sandboxing: non‑root containers; read‑only FS; seccomp profiles; disable execution of scanned content.
- Release integrity: all first‑party images are cosign‑signed; Workers/WebService self‑verify at startup.
11) Observability & audit
-
Metrics:
scanner.jobs_inflight,scanner.scan_latency_secondsscanner.layer_cache_hits_total,scanner.file_cas_hits_totalscanner.artifact_bytes_total{format}scanner.attestation_latency_seconds,scanner.rekor_failures_total
-
Tracing: spans for acquire→union→analyzers→compose→emit→sign→log.
-
Audit logs: DSSE requests log
license_id,image_digest,artifactSha256,policy_digest?, Rekor UUID on success.
12) Testing matrix
- Determinism: given same image + analyzers → byte‑identical CDX Protobuf; JSON normalized.
- OS packages: ground‑truth images per distro; compare to package DB.
- Lang ecosystems: sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfile‑only.
- Native & EntryTrace: ELF graph correctness; shell AST cases (includes, run‑parts, exec, case/if).
- Diff: layer attribution against synthetic two‑image sequences.
- Performance: cold vs warm cache; large
node_modulesandsite‑packages. - Security: ensure no code execution from image; fuzz parser inputs; path traversal resistance on layer extract.
13) Failure modes & degradations
- Missing OS DB (files exist, DB removed): record files; do not fabricate package components; emit
bin:{sha256}where unavoidable; flag in evidence. - Unreadable metadata (corrupt dist‑info): record file evidence; skip component creation; annotate.
- Dynamic shell constructs: mark unresolved edges with reasons (env var unknown) and continue; Usage view may be partial.
- Registry rate limits: honor backoff; queue job retries with jitter.
- Signer refusal (license/plan/version): scan completes; artifact produced; no attestation; WebService marks result as unverified.
14) Optional plug‑ins (off by default)
- Patch‑presence detector (signature‑based backport checks). Reads curated function‑level signatures from advisories; inspects binaries for patched code snippets to lower false‑positives for backported fixes. Runs as a sidecar analyzer that annotates components; never overrides core identities.
- Runtime probes (with Zastava): when allowed, compare /proc//maps (DSOs actually loaded) with static Usage view for precision.
15) DevOps & operations
- HA: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
- Retention: ILM rules per artifact class (
short,default,compliance); Object Lock for compliance artifacts (reports, signed SBOMs). - Upgrades: bump cache schema when analyzer outputs change; WebService triggers refresh of dependent artifacts.
- Backups: Mongo (daily dumps); MinIO (versioned buckets, replication); Rekor v2 DB snapshots.
16) CLI & UI touch points
- CLI:
stellaops scan <ref>,stellaops diff --old --new,stellaops export,stellaops verify attestation <bundle|url>. - UI: Scan detail shows Inventory/Usage toggles, Diff by Layer, Attestation badge (verified/unverified), Rekor link, and EntryTrace chain with file:line breadcrumbs.
17) Roadmap (Scanner)
- M2: Windows containers (MSI/SxS/GAC analyzers), PE/Mach‑O native analyzer, deeper Rust metadata.
- M2: Buildx generator GA (certified external registries), cross‑registry trust policies.
- M3: Patch‑presence plug‑in GA (opt‑in), cross‑image corpus clustering (evidence‑only; not identity).
- M3: Advanced EntryTrace (POSIX shell features breadth, busybox detection).
Appendix A — EntryTrace resolution (pseudo)
ResolveEntrypoint(ImageConfig cfg, RootFs fs):
cmd = Normalize(cfg.ENTRYPOINT, cfg.CMD)
stack = [ Script(cmd, path=FindOnPath(cmd[0], fs)) ]
visited = set()
while stack not empty and depth < MAX:
cur = stack.pop()
if cur in visited: continue
visited.add(cur)
if IsShellScript(cur.path):
ast = ParseShell(cur.path)
foreach directive in ast:
if directive is Source include:
p = ResolveInclude(include.path, cur.env, fs)
stack.push(Script(p))
if directive is Exec call:
p = ResolveExec(call.argv[0], cur.env, fs)
stack.push(Program(p, argv=call.argv))
if directive is Interpreter (python -m / node / java -jar):
term = ResolveInterpreterTarget(call, fs)
stack.push(Program(term))
else:
return Terminal(cur.path)
return Unknown(reason)
Appendix A.1 — EntryTrace Explainability
EntryTrace emits structured diagnostics and metrics so operators can quickly understand why resolution succeeded or degraded:
| Reason | Description | Typical Mitigation |
|---|---|---|
CommandNotFound |
A command referenced in the script cannot be located in the layered root filesystem or PATH. |
Ensure binaries exist in the image or extend PATH hints. |
MissingFile |
source/./run-parts targets are missing. |
Bundle the script or guard the include. |
DynamicEnvironmentReference |
Path depends on $VARS that are unknown at scan time. |
Provide defaults via scan metadata or accept partial usage. |
RecursionLimitReached |
Nested includes exceeded the analyzer depth limit (default 64). | Flatten indirection or increase the limit in options. |
RunPartsEmpty |
run-parts directory contained no executable entries. |
Remove empty directories or ignore if intentional. |
JarNotFound / ModuleNotFound |
Java/Python targets missing, preventing interpreter tracing. | Ship the jar/module with the image or adjust the launcher. |
Diagnostics drive two metrics published by EntryTraceMetrics:
entrytrace_resolutions_total{outcome}— resolution attempts segmented by outcome (resolved,partiallyresolved,unresolved).entrytrace_unresolved_total{reason}— diagnostic counts keyed by reason.
Structured logs include entrytrace.path, entrytrace.command, entrytrace.reason, and entrytrace.depth, all correlated with scan/job IDs. Timestamps are normalized to UTC (microsecond precision) to keep DSSE attestations and UI traces explainable.
Appendix B — BOM‑Index sidecar
struct Header { magic, version, imageDigest, createdAt }
vector<string> purls
map<purlIndex, roaring_bitmap> components
optional map<purlIndex, roaring_bitmap> usedByEntrypoint