Files
git.stella-ops.org/docs/ARCHITECTURE_SCANNER.md
master daa6a4ae8c
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Build Test Deploy / build-test (push) Has been cancelled
Build Test Deploy / authority-container (push) Has been cancelled
Build Test Deploy / docs (push) Has been cancelled
Build Test Deploy / deploy (push) Has been cancelled
up
2025-10-19 10:38:55 +03:00

20 KiB
Raw Blame History

component_architecture_scanner.md — StellaOps Scanner (2025Q4)

Scope. Implementationready architecture for the Scanner subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perlayer caching, threeway diffs, artifact catalog (MinIO+Mongo), attestation handoff, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).


0) Mission & boundaries

Mission. Produce deterministic, explainable SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: Inventory (everything present) and Usage (entrypoint closure + actually linked libs). Attach attestations through Signer→Attestor→Rekor v2.

Boundaries.

  • Scanner does not produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
  • Scanner does not keep thirdparty SBOM warehouses. It may bind to existing attestations for exact hashes.
  • Core analyzers are deterministic (no fuzzy identity). Optional heuristic plugins (e.g., patchpresence) run under explicit flags and never contaminate the core SBOM.

1) Solution & project layout

src/
 ├─ StellaOps.Scanner.WebService/            # REST control plane, catalog, diff, exports
 ├─ StellaOps.Scanner.Worker/                # queue consumer; executes analyzers
 ├─ StellaOps.Scanner.Models/                # DTOs, evidence, graph nodes, CDX/SPDX adapters
 ├─ StellaOps.Scanner.Storage/               # Mongo repositories; MinIO object client; ILM/GC
 ├─ StellaOps.Scanner.Queue/                 # queue abstraction (Redis/NATS/RabbitMQ)
 ├─ StellaOps.Scanner.Cache/                 # layer cache; file CAS; bloom/bitmap indexes
 ├─ StellaOps.Scanner.EntryTrace/            # ENTRYPOINT/CMD → terminal program resolver (shell AST)
 ├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
 ├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Python|Go|DotNet|Rust]/
 ├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/   # PE/Mach-O planned (M2)
 ├─ StellaOps.Scanner.Emit.CDX/              # CycloneDX (JSON + Protobuf)
 ├─ StellaOps.Scanner.Emit.SPDX/             # SPDX 3.0.1 JSON
 ├─ StellaOps.Scanner.Diff/                  # image→layer→component threeway diff
 ├─ StellaOps.Scanner.Index/                 # BOMIndex sidecar (purls + roaring bitmaps)
 ├─ StellaOps.Scanner.Tests.*                # unit/integration/e2e fixtures
 └─ tools/
     ├─ StellaOps.Scanner.Sbomer.BuildXPlugin/   # BuildKit generator (image referrer SBOMs)
     └─ StellaOps.Scanner.Sbomer.DockerImage/    # CLIdriven scanner container

Analyzer assemblies and buildx generators are packaged as restart-time plug-ins under plugins/scanner/** with manifests; services must restart to activate new plug-ins.

1.1 Queue backbone (Redis / NATS)

StellaOps.Scanner.Queue exposes a transport-agnostic contract (IScanQueue/IScanQueueLease) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:

  • Redis Streams (default). Uses consumer groups, deterministic idempotency keys (scanner:jobs:idemp:*), and supports lease claim (XCLAIM), renewal, exponential-backoff retries, and a scanner:jobs:dead stream for exhausted attempts.
  • NATS JetStream. Provisions the SCANNER_JOBS work-queue stream + durable consumer scanner-workers, publishes with MsgId for dedupe, applies backoff via NAK delays, and routes dead-lettered jobs to SCANNER_JOBS_DEAD.

Metrics are emitted via Meter counters (scanner_queue_enqueued_total, scanner_queue_retry_total, scanner_queue_deadletter_total), and ScannerQueueHealthCheck pings the active backend (Redis PING, NATS PING). Configuration is bound from scanner.queue:

scanner:
  queue:
    kind: redis # or nats
    redis:
      connectionString: "redis://queue:6379/0"
      streamName: "scanner:jobs"
    nats:
      url: "nats://queue:4222"
      stream: "SCANNER_JOBS"
      subject: "scanner.jobs"
      durableConsumer: "scanner-workers"
      deadLetterSubject: "scanner.jobs.dead"
    maxDeliveryAttempts: 5
    retryInitialBackoff: 00:00:05
    retryMaxBackoff: 00:02:00

The DI extension (AddScannerQueue) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.

Runtime formfactor: two deployables

  • Scanner.WebService (stateless REST)
  • Scanner.Worker (N replicas; queuedriven)

2) External dependencies

  • OCI registry with Referrers API (discover attached SBOMs/signatures).
  • MinIO (S3compatible) for SBOM artifacts; Object Lock for immutable classes; ILM for TTL.
  • MongoDB for catalog, job state, diffs, ILM rules.
  • Queue (Redis Streams/NATS/RabbitMQ).
  • Authority (onprem OIDC) for OpToks (DPoP/mTLS).
  • Signer + Attestor (+ Fulcio/KMS + Rekor v2) for DSSE + transparency.

3) Contracts & data model

3.1 Evidencefirst component model

Nodes

  • Image, Layer, File
  • Component (purl?, name, version?, type, id — may be bin:{sha256})
  • Executable (ELF/PE/MachO), Library (native or managed), EntryScript (shell/launcher)

Edges (all carry Evidence)

  • contains(Image|Layer → File)
  • installs(PackageDB → Component) (OS database row)
  • declares(InstalledMetadata → Component) (distinfo, pom.properties, deps.json…)
  • links_to(Executable → Library) (ELF DT_NEEDED, PE imports)
  • calls(EntryScript → Program) (file:line from shell AST)
  • attests(Rekor → Component|Image) (SBOM/predicate binding)
  • bound_from_attestation(Component_attested → Component_observed) (hash equality proof)

Evidence

{ source: enum, locator: (path|offset|line), sha256?, method: enum, timestamp }

No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.

3.2 Catalog schema (Mongo)

  • artifacts

    { _id, type: layer-bom|image-bom|diff|index,
      format: cdx-json|cdx-pb|spdx-json,
      bytesSha256, size, rekor: { uuid,index,url }?,
      ttlClass, immutable, refCount, createdAt }
    
  • images { imageDigest, repo, tag?, arch, createdAt, lastSeen }

  • layers { layerDigest, mediaType, size, createdAt, lastSeen }

  • links { fromType, fromDigest, artifactId } // image/layer -> artifact

  • jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error }

  • lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }

3.3 Object store layout (MinIO)

layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb            # CycloneDX Protobuf
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin              # purls + roaring bitmaps
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json              # DSSE bundle (cert chain + Rekor proof)

4) REST API (Scanner.WebService)

All under /api/v1/scanner. Auth: OpTok (DPoP/mTLS); RBAC scopes.

POST /scans                        { imageRef|digest, force?:bool } → { scanId }
GET  /scans/{id}                   → { status, imageDigest, artifacts[], rekor? }
GET  /sboms/{imageDigest}          ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
GET  /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
POST /exports                      { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
POST /reports                      { imageDigest, policyRevision? } → { reportId, rekor? }   # delegates to backend policy+vex
GET  /catalog/artifacts/{id}       → { meta }
GET  /healthz | /readyz | /metrics

5) Execution flow (Worker)

5.1 Acquire & verify

  1. Resolve image (prefer repo@sha256:…).
  2. (Optional) verify image signature per policy (cosign).
  3. Pull blobs, compute layer digests; record metadata.

5.2 Layer union FS

  • Apply whiteouts; materialize final filesystem; map file → first introducing layer.
  • Windows layers (MSI/SxS/GAC) planned in M2.

5.3 Evidence harvest (parallel analyzers; deterministic only)

A) OS packages

  • apk: /lib/apk/db/installed
  • dpkg: /var/lib/dpkg/status, /var/lib/dpkg/info/*.list
  • rpm: /var/lib/rpm/Packages (via librpm or parser)
  • Record name, version (epoch/revision), arch, source package where present, and declared file lists.

B) Language ecosystems (installed state only)

  • Java: META-INF/maven/*/pom.properties, MANIFEST → pkg:maven/...
  • Node: node_modules/**/package.jsonpkg:npm/...
  • Python: *.dist-info/{METADATA,RECORD}pkg:pypi/...
  • Go: Go buildinfo in binaries → pkg:golang/...
  • .NET: *.deps.json + assembly metadata → pkg:nuget/...
  • Rust: crates only when explicitly present (embedded metadata or cargo/registry traces); otherwise binaries reported as bin:{sha256}.

Rule: We only report components proven on disk with authoritative metadata. Lockfiles are evidence only.

C) Native link graph

  • ELF: parse PT_INTERP, DT_NEEDED, RPATH/RUNPATH, GNU symbol versions; map SONAMEs to file paths; link executables → libs.
  • PE/MachO (planned M2): import table, delayimports; version resources; code signatures.
  • Map libs back to OS packages if possible (via file lists); else emit bin:{sha256} components.

D) EntryTrace (ENTRYPOINT/CMD → terminal program)

  • Read image config; parse shell (POSIX/Bash subset) with AST: source/. includes; case/if; exec/command; runparts.
  • Resolve commands via PATH within the built rootfs; follow language launchers (Java/Node/Python) to identify the terminal program (ELF/JAR/venv script).
  • Record file:line and choices for each hop; output chain graph.
  • Unresolvable dynamic constructs are recorded as unknown edges with reasons (e.g., $FOO unresolved).

E) Attestation & SBOM bind (optional)

  • For each file hash or binary hash, query local cache of Rekor v2 indices; if an SBOM attestation is found for exact hash, bind it to the component (origin=attested).
  • For the image digest, likewise bind SBOM attestations (buildtime referrers).

5.4 Component normalization (exact only)

  • Create Component nodes only with deterministic identities: purl, or bin:{sha256} for unlabeled binaries.
  • Record origin (OS DB, installed metadata, linker, attestation).

5.5 SBOM assembly & emit

  • Perlayer SBOM fragments: components introduced by the layer (+ relationships).
  • Image SBOMs: merge fragments; refer back to them via CycloneDX BOMLink (or SPDX ExternalRef).
  • Emit both Inventory & Usage views.
  • Serialize CycloneDX JSON and CycloneDX Protobuf; optionally SPDX 3.0.1 JSON.
  • Build BOMIndex sidecar: purl table + roaring bitmap; flag usedByEntrypoint components for fast backend joins.

5.6 DSSE attestation (via Signer/Attestor)

  • WebService constructs predicate with image_digest, stellaops_version, license_id, policy_digest? (when emitting final reports), timestamps.
  • Calls Signer (requires OpTok + PoE); Signer verifies entitlement + scanner image integrity and returns DSSE bundle.
  • Attestor logs to Rekor v2; returns {uuid,index,proof} → stored in artifacts.rekor.

6) Threeway diff (image → layer → component)

6.1 Keys & classification

  • Component key: purl when present; else bin:{sha256}.
  • Diff classes: added, removed, version_changed (upgraded|downgraded), metadata_changed (e.g., origin from attestation vs observed).
  • Layer attribution: for each change, resolve the introducing/removing layer.

6.2 Algorithm (outline)

A = components(imageOld, key)
B = components(imageNew, key)

added   = B \ A
removed = A \ B
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }

for each item in added/removed/changed:
   layer = attribute_to_layer(item, imageOld|imageNew)
   usageFlag = usedByEntrypoint(item, imageNew)
emit diff.json (grouped by layer with badges)

Diffs are stored as artifacts and feed UI and CLI.


7) Buildtime SBOMs (fast CI path)

Scanner.Sbomer.BuildXPlugin can act as a BuildKit generator:

  • During docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, run analyzers on the build context/output; attach SBOMs as OCI referrers to the built image.
  • Optionally request Signer/Attestor to produce StellaOpsverified attestation immediately; else, Scanner.WebService can verify and reattest postpush.
  • Scanner.WebService trusts buildtime SBOMs per policy, enabling norescan for unchanged bases.

8) Configuration (YAML)

scanner:
  queue:
    kind: redis
    url: "redis://queue:6379/0"
  mongo:
    uri: "mongodb://mongo/scanner"
  s3:
    endpoint: "http://minio:9000"
    bucket: "stellaops"
    objectLock: "governance"   # or 'compliance'
  analyzers:
    os: { apk: true, dpkg: true, rpm: true }
    lang: { java: true, node: true, python: true, go: true, dotnet: true, rust: true }
    native: { elf: true, pe: false, macho: false }    # PE/Mach-O in M2
    entryTrace: { enabled: true, shellMaxDepth: 64, followRunParts: true }
  emit:
    cdx: { json: true, protobuf: true }
    spdx: { json: true }
    compress: "zstd"
  rekor:
    url: "https://rekor-v2.internal"
  signer:
    url: "https://signer.internal"
  limits:
    maxParallel: 8
    perRegistryConcurrency: 2
  policyHints:
    verifyImageSignature: false
    trustBuildTimeSboms: true

9) Scale & performance

  • Parallelism: peranalyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.

  • Distributed locks per layer digest to prevent duplicate work across Workers.

  • Registry throttles: perhost concurrency budgets; exponential backoff on 429/5xx.

  • Targets:

    • Buildtime: P95 ≤35s on warmed bases (CI generator).
    • Postbuild delta: P95 ≤10s for 200MB images with cache hit.
    • Emit: CycloneDX Protobuf ≤150ms for 5k components; JSON ≤500ms.
    • Diff: ≤200ms for 5k vs 5k components.

10) Security posture

  • AuthN: Authorityissued short OpToks (DPoP/mTLS).
  • AuthZ: scopes (scanner.scan, scanner.export, scanner.catalog.read).
  • mTLS to Signer/Attestor; only Signer can sign.
  • No network fetches during analysis (except registry pulls and optional Rekor index reads).
  • Sandboxing: nonroot containers; readonly FS; seccomp profiles; disable execution of scanned content.
  • Release integrity: all firstparty images are cosignsigned; Workers/WebService selfverify at startup.

11) Observability & audit

  • Metrics:

    • scanner.jobs_inflight, scanner.scan_latency_seconds
    • scanner.layer_cache_hits_total, scanner.file_cas_hits_total
    • scanner.artifact_bytes_total{format}
    • scanner.attestation_latency_seconds, scanner.rekor_failures_total
  • Tracing: spans for acquire→union→analyzers→compose→emit→sign→log.

  • Audit logs: DSSE requests log license_id, image_digest, artifactSha256, policy_digest?, Rekor UUID on success.


12) Testing matrix

  • Determinism: given same image + analyzers → byteidentical CDX Protobuf; JSON normalized.
  • OS packages: groundtruth images per distro; compare to package DB.
  • Lang ecosystems: sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfileonly.
  • Native & EntryTrace: ELF graph correctness; shell AST cases (includes, runparts, exec, case/if).
  • Diff: layer attribution against synthetic twoimage sequences.
  • Performance: cold vs warm cache; large node_modules and sitepackages.
  • Security: ensure no code execution from image; fuzz parser inputs; path traversal resistance on layer extract.

13) Failure modes & degradations

  • Missing OS DB (files exist, DB removed): record files; do not fabricate package components; emit bin:{sha256} where unavoidable; flag in evidence.
  • Unreadable metadata (corrupt distinfo): record file evidence; skip component creation; annotate.
  • Dynamic shell constructs: mark unresolved edges with reasons (env var unknown) and continue; Usage view may be partial.
  • Registry rate limits: honor backoff; queue job retries with jitter.
  • Signer refusal (license/plan/version): scan completes; artifact produced; no attestation; WebService marks result as unverified.

14) Optional plugins (off by default)

  • Patchpresence detector (signaturebased backport checks). Reads curated functionlevel signatures from advisories; inspects binaries for patched code snippets to lower falsepositives for backported fixes. Runs as a sidecar analyzer that annotates components; never overrides core identities.
  • Runtime probes (with Zastava): when allowed, compare /proc//maps (DSOs actually loaded) with static Usage view for precision.

15) DevOps & operations

  • HA: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
  • Retention: ILM rules per artifact class (short, default, compliance); Object Lock for compliance artifacts (reports, signed SBOMs).
  • Upgrades: bump cache schema when analyzer outputs change; WebService triggers refresh of dependent artifacts.
  • Backups: Mongo (daily dumps); MinIO (versioned buckets, replication); Rekor v2 DB snapshots.

16) CLI & UI touch points

  • CLI: stellaops scan <ref>, stellaops diff --old --new, stellaops export, stellaops verify attestation <bundle|url>.
  • UI: Scan detail shows Inventory/Usage toggles, Diff by Layer, Attestation badge (verified/unverified), Rekor link, and EntryTrace chain with file:line breadcrumbs.

17) Roadmap (Scanner)

  • M2: Windows containers (MSI/SxS/GAC analyzers), PE/MachO native analyzer, deeper Rust metadata.
  • M2: Buildx generator GA (certified external registries), crossregistry trust policies.
  • M3: Patchpresence plugin GA (optin), crossimage corpus clustering (evidenceonly; not identity).
  • M3: Advanced EntryTrace (POSIX shell features breadth, busybox detection).

Appendix A — EntryTrace resolution (pseudo)

ResolveEntrypoint(ImageConfig cfg, RootFs fs):
  cmd = Normalize(cfg.ENTRYPOINT, cfg.CMD)
  stack = [ Script(cmd, path=FindOnPath(cmd[0], fs)) ]
  visited = set()

  while stack not empty and depth < MAX:
    cur = stack.pop()
    if cur in visited: continue
    visited.add(cur)

    if IsShellScript(cur.path):
       ast = ParseShell(cur.path)
       foreach directive in ast:
         if directive is Source include:
            p = ResolveInclude(include.path, cur.env, fs)
            stack.push(Script(p))
         if directive is Exec call:
            p = ResolveExec(call.argv[0], cur.env, fs)
            stack.push(Program(p, argv=call.argv))
         if directive is Interpreter (python -m / node / java -jar):
            term = ResolveInterpreterTarget(call, fs)
            stack.push(Program(term))
    else:
       return Terminal(cur.path)

  return Unknown(reason)

Appendix B — BOMIndex sidecar

struct Header { magic, version, imageDigest, createdAt }
vector<string> purls
map<purlIndex, roaring_bitmap> components
optional map<purlIndex, roaring_bitmap> usedByEntrypoint