Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

feat: Implement Runtime Facts ingestion service and NDJSON reader

- Added RuntimeFactsNdjsonReader for reading NDJSON formatted runtime facts.
- Introduced IRuntimeFactsIngestionService interface and its implementation.
- Enhanced Program.cs to register new services and endpoints for runtime facts.
- Updated CallgraphIngestionService to include CAS URI in stored artifacts.
- Created RuntimeFactsValidationException for validation errors during ingestion.
- Added tests for RuntimeFactsIngestionService and RuntimeFactsNdjsonReader.
- Implemented SignalsSealedModeMonitor for compliance checks in sealed mode.
- Updated project dependencies for testing utilities.

2025-11-10 07:56:15 +02:00

18 KiB

Raw Blame History

Sprint 137 - Scanner & Surface

Phase focus: Scanner.VIII — Analyzer gap design & readiness.

Depends on: Sprint 136 · Scanner.VII (Surface env/fs/secrets) to ensure shared primitives exist.
Feeds: Sprint 138 (Ruby parity) and Sprint 139 (language-specific analyzers) by locking designs + policy hooks.

Task ID	State	Summary	Owner / Source	Depends On
`SCANNER-ENG-0002`	DONE (2025-11-09)	Design the Node.js lockfile collector + CLI validator per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`, capturing Surface + policy requirements before implementation.	Scanner Guild, CLI Guild (docs/modules/scanner)	—
`SCANNER-ENG-0003`	DONE (2025-11-09)	Design Python lockfile + editable-install parity checks with policy predicates and CLI workflow coverage as outlined in the gap analysis.	Python Analyzer Guild, CLI Guild (docs/modules/scanner)	—
`SCANNER-ENG-0004`	DONE (2025-11-09)	Design Java lockfile ingestion/validation (Gradle/SBT collectors, CLI verb, policy hooks) to close comparison gaps.	Java Analyzer Guild, CLI Guild (docs/modules/scanner)	—
`SCANNER-ENG-0005`	DONE (2025-11-09)	Enhance Go stripped-binary fallback inference design, including inferred module metadata + policy integration, per the gap analysis.	Go Analyzer Guild (docs/modules/scanner)	—
`SCANNER-ENG-0006`	DONE (2025-11-09)	Expand Rust fingerprint coverage design (enriched fingerprint catalogue + policy controls) per the comparison matrix.	Rust Analyzer Guild (docs/modules/scanner)	—
`SCANNER-ENG-0007`	DONE (2025-11-09)	Design the deterministic secret leak detection pipeline covering rule packaging, Policy Engine integration, and CLI workflow.	Scanner Guild, Policy Guild (docs/modules/scanner)	—

2025-11-09: The gap designs below capture analyzer, Surface, CLI, and policy contracts for SCANNER-ENG-0002…0007; tasks were taken DOING → DONE after this review.

Implementation progress (2025-11-09)

Gradle/Maven lock ingestion is now wired into JavaLanguageAnalyzer: JavaLockFileCollector sorts lock metadata deterministically, merges it with archive findings (lockConfiguration, lockRepository, lockResolved), and emits declared-only components (with declaredOnly=true, lockSource, lockLocator) whenever jars are missing. CLI/Surface telemetry tags were updated to carry per-language declared/missing counters.
stella java lock-validate shares the HandleLanguageLockValidateAsync helper with Node/Python, has table/JSON output parity, and is documented alongside the scanner README + CLI guide (including the new metric stellaops.cli.java.lock_validate.count). Tests now cover the Ruby/Node/Java lock workflows end-to-end via CommandHandlersTests.

Design outcomes

SCANNER-ENG-0002 — Node.js lockfile collector + CLI validator

Scope & goals

Provide deterministic ingestion of pnpm-lock.yaml, package-lock.json, and yarn.lock so declared dependencies are preserved even when node_modules is absent.
Offer a CLI validator that runs without scheduling a scan, reusing the same collector and Surface safety rails.

Design decisions

Add NodeLockfileCollector under StellaOps.Scanner.Analyzers.Lang.Node. The collector normalises manifests into a shared model (package name, version, resolved, integrity, registry, workspace path) and emits DeclaredOnly = true components stored beside installed fragments (LayerComponentFragment.DeclaredSources).
Reuse LanguageAnalyzerContext merge rules so installed packages supersede declared-only entries while retaining discrepancies for policy.
Gate execution through Surface.Validation (scanner.lockfiles.node.* knobs) that enforce max lockfile size, workspace limits, and registry allowlists; violations fail fast with deterministic error IDs.
Private registries referenced in lockfiles must use secret:// handles. Surface.Secrets resolves these handles before validation and the resolved metadata (never the secret) is attached to the collector context for auditing.
EntryTrace usage hints annotate runtime packages; when a package is used at runtime but missing from the lockfile, the merge step tags it with UsageWithoutDeclaration.

CLI, policy, docs

Add stella node lock-validate [path] --format {auto|pnpm|npm|yarn} that runs locally, reuses Surface controls, and returns canonical JSON + table summaries. The CLI inherits --surface-config so air-gapped configs stay consistent.
Scanner/WebService gains --node-lockfiles / SCANNER__NODE__LOCKFILES__ENABLED toggles to control ingestion during full scans.
Policy Engine receives predicates: node.lock.declaredMissing, node.lock.registryDisallowed, node.lock.declarationOnly. Templates show how to fail on disallowed registries while only warning on declared-only findings that never reach runtime.
Update docs/modules/scanner/architecture.md and policy DSL appendices with the new evidence flags and CLI workflow.

Testing, telemetry, rollout

Golden fixtures for pnpm v8, npm v9, and yarn berry lockfiles live under tests/Scanner.Analyzers.Node/__fixtures__/lockfiles. Deterministic snapshots are asserted in both analyzer and CLI tests.
Add integration coverage in tests/Scanner.Cli.Node verifying exit codes and explain output for mismatched packages/registries.
Emit counters (scanner.node.lock.declared, scanner.node.lock.mismatch, scanner.node.lock.registry_blocked) plus structured logs keyed by lockfile digest.
Offline Kit ships the parser tables and CLI binary help under offline/scanner/node-lockfiles/README.md.

Implementation status (2025-11-09)

Lockfile declarations now emit DeclaredOnly components in StellaOps.Scanner.Analyzers.Lang.Node with lock source/locator metadata and deterministic evidence for policy use.
CLI verb stella node lock-validate inspects lockfiles locally, rendering declared-only/missing-lock summaries and emitting stellaops.cli.node.lock_validate.count telemetry.
Node analyzer determinism fixtures updated with declared-only coverage; CLI unit suite exercises the new handler.
Python analyzer ingests requirements*.txt, Pipfile.lock, and poetry.lock, tagging installed distributions with lockSource metadata and creating declared-only components. stella python lock-validate mirrors the workflow for offline validation and records stellaops.cli.python.lock_validate.count.

SCANNER-ENG-0003 — Python lockfile + editable-install parity

Scope & goals

Parse Python lockfiles (poetry.lock, Pipfile.lock, hashed requirements*.txt) to capture declared graphs pre-install.
Detect editable installs and local path references so policy can assert parity between lockfiles and runtime contents.

Design decisions

Introduce PythonLockfileCollector in StellaOps.Scanner.Analyzers.Lang.Python, capable of reading Poetry, Pipenv, pip-tools, and raw requirements syntax (including environment markers, extras, hashes, VCS refs).
Extend the collector with an EditableResolver that inspects lockfile entries (path =, editable = true, -e ./pkg) and consults Surface.FS to normalise the referenced directory, capturing EditablePath, SourceDigest, and VcsRef metadata.
Merge results with installed *.dist-info data using LanguageAnalyzerContext. Installed evidence overrides declared-only components; editable packages missing from the artifact layer are tagged EditableMissing.
Surface.Validation adds knobs scanner.lockfiles.python.maxBytes, scanner.lockfiles.python.allowedIndexes, and ensures hashes are present when policy mandates repeatable environments. Private index credentials are provided via Surface.Secrets and never persisted.

CLI, policy, docs

New CLI verb stella python lock-validate mirrors the Node workflow, validates editable references resolve within the checked-out tree, and emits parity diagnostics.
Scanner runs accept --python-lockfiles to toggle ingestion per tenant.
Policy predicates: python.lock.declaredMissing, python.lock.editableUnpinned, python.lock.indexDisallowed. Editable packages missing from the filesystem can be set to fail builds or raise waivers.
Document the workflow in docs/modules/scanner/architecture.md and the policy cookbook, including guidance on handling build-system backends.

Testing, telemetry, rollout

Fixtures covering Poetry 1.6, Pipenv 2024.x, requirements.txt with markers, and mixed editable/VCS entries live beside the analyzer tests.
CLI golden output asserts deterministic ordering and masking of secrets in URLs.
Metrics: scanner.python.lock.declared, scanner.python.lock.editable, scanner.python.lock.failures.
Offline Kit bundles include parser definitions and sample policies to keep air-gapped tenants aligned.

SCANNER-ENG-0004 — Java/Gradle/SBT lockfile ingestion & validation

Scope & goals

Capture Gradle, Maven, and SBT dependency locks before artifacts are built, along with repository provenance and configuration scopes.
Provide CLI validation and policy predicates enforcing repository allowlists and declared/runtime parity.

Design decisions

Add collectors: GradleLockfileCollector (reads gradle.lockfile and gradle/dependency-locks/*.lock), MavenLockfileCollector (parses pom.xml/pom.lock + dependencyManagement overrides), and SbtLockfileCollector (reads Ivy resolution outputs or dependencies.lock).
Each collector emits normalized records keyed by groupId:artifactId:version plus config scope (compileClasspath, runtimeClasspath, etc.), repository URI, checksum, and optional classifier. Records are stored as DeclaredOnly fragments associated with their workspace path.
Surface.Validation enforces file-size limits, repository allowlists (scanner.lockfiles.java.allowedRepos), and optional checksum requirements. Private Maven credentials flow through Surface.Secrets.
JavaLanguageAnalyzer merges declared entries with installed archives. Runtime usage from EntryTrace is attached so policies can prioritize gaps that reach runtime.

CLI, policy, docs

CLI verb stella java lock-validate supports Gradle/Maven/SBT modes, prints mismatched dependencies, and checks repository policy.
Scanner flags --java-lockfiles or env SCANNER__JAVA__LOCKFILES__ENABLED gate ingestion. Lockfile artifacts are uploaded to Surface.FS for evidence replay.
Policy predicates: java.lock.declaredMissing, java.lock.repoDisallowed, java.lock.unpinned (no checksum). Explain traces cite repository + config scope for each discrepancy.
Docs: update scanner module dossier and policy template library with repository governance examples.

Testing, telemetry, rollout

Fixtures derived from sample Gradle multi-projects, Maven BOM hierarchies, and SBT builds validate parser coverage and CLI messaging.
Metrics scanner.java.lock.declared, scanner.java.lock.missing, scanner.java.lock.repo_blocked feed the observability dashboards.
Offline kits include parser grammars and CLI docs so air-gapped tenants can enforce repo policies without SaaS dependencies.

SCANNER-ENG-0005 — Go stripped-binary fallback inference

Scope & goals

Enrich the stripped-binary fallback so Go modules remain explainable even without embedded buildinfo, and give Policy Engine knobs to treat inferred evidence differently.

Design decisions

Extend GoBinaryScanner with an inference pipeline that, when build info is absent, parses ELF/Mach-O symbol tables and DWARF data using the existing ElfSharp bindings. Symbols feed into a new GoSymbolInferenceEngine that matches against a signed GoFingerprintCatalog under StellaOps.Scanner.Analyzers.Lang.Go.Fingerprints.
Inferred results carry Confidence (0–1), matched symbol counts, and reasons (BuildInfoMissing, SymbolMatches, PkgPathFallback). Records are emitted as InferredModule metadata alongside hashed fallback components.
Update fragment schemas so DSSE-composed BOMs include both the hashed fallback and the inference summary, enabling deterministic replay.
Surface.Validation exposes scanner.analyzers.go.fallback.enabled, scanner.analyzers.go.fallback.maxSymbolBytes, ensuring workloads can opt out or constrain processing time.

Policy, CLI, docs

Policy predicates go.module.inferenceConfidence and go.module.hashOnly let tenants fail when only hashed provenance exists or warn when inference confidence < threshold.
CLI flag --go-fallback-detail (and corresponding API query) prints hashed vs inferred modules, confidence, and remediation hints (e.g., rebuild with -buildvcs).
Documentation updates cover inference details, how confidence feeds lattice weights, and how to author waivers.

Testing, telemetry, rollout

Add stripped binary fixtures (Linux, macOS) plus intentionally obfuscated samples. Tests assert deterministic inference and hashing.
Metrics scanner.go.inference.count, scanner.go.inference.confidence_bucket ensure observability; logs include imageDigest, binaryPath, confidence.
Offline Kit bundles the fingerprint catalog and inference changelog so air-gapped tenants can audit provenance.

SCANNER-ENG-0006 — Rust fingerprint coverage expansion

Scope & goals

Improve Rust evidence for stripped binaries by expanding fingerprint sources, symbol parsing, and policy controls over heuristic findings.

Design decisions

Build a new RustFingerprintCatalog signed and versioned, fed by Cargo crate metadata, community hash contributions, and curated fingerprints from StellaOps scans. Catalog lives under StellaOps.Scanner.Analyzers.Lang.Rust.Fingerprints with deterministic ordering.
Extend RustAnalyzerCollector with symbol parsing (DWARF, ELF build IDs) via SymbolGraphResolver. Resolver correlates crate sections, monomorphized symbol prefixes, and #[panic_handler] markers to infer crate names and versions.
Emit inference metadata (fingerprintId, confidence, symbolEvidence[]) alongside hashed fallbacks. Authoritative Cargo.lock data (when present) still wins in merges.
Surface.Validation adds toggles for fingerprint freshness and maximum catalog size per tenant. Offline bundles deliver catalog updates signed via DSSE.

Policy, CLI, docs

Policy predicates: rust.fingerprint.confidence, rust.fingerprint.catalogAgeDays. Templates show how to warn when only heuristic data exists, or fail if catalog updates are stale.
CLI flag --rust-fingerprint-detail prints authoritative vs inferred crates, symbol samples, and guidance.
Documentation (scanner module + policy guide) explains how inference is stored, how catalog publishing works, and how to tune policy weights.

Testing, telemetry, rollout

Add fixtures for stripped Rust binaries across editions (2018–2024) and with/without LTO. Determinism tests compare catalog revisions and inference outputs.
Metrics scanner.rust.fingerprint.authoritative, scanner.rust.fingerprint.inferred, scanner.rust.fingerprint.catalog_version feed dashboards and alerts.
Offline kit updates include catalog packages, verification instructions, and waiver templates tied to predicate names.

SCANNER-ENG-0007 — Deterministic secret leak detection pipeline

Scope & goals

Provide first-party secret leak detection that matches competitor capabilities while preserving deterministic, offline-friendly execution and explainability.

Design decisions

Introduce StellaOps.Scanner.Analyzers.Secrets, a restart-time plug-in that consumes rule bundles (ruleset.tgz) signed with DSSE and versioned (semantic version + hash). Bundles live under plugins/scanner/secrets/rules/<version>.
Rule bundles contain deterministic regex/entropy definitions, context windows, and masking directives. A rule index is generated at build time to guarantee deterministic ordering.
Analyzer executes after Surface validation of each file/layer. Files pass through a streaming matcher that outputs SecretLeakEvidence (rule id, severity, confidence, file path, byte ranges, masking applied). Findings persist in ScanAnalysisStore and align with DSSE exports.
Surface.Validation introduces scanner.secrets.rules.bundle, scanner.secrets.maxFileBytes, and scanner.secrets.targetGlobs. Surface.Secrets supplies allowlist tokens (e.g., approved test keys) without exposing plaintext to analyzers.
Events/attestations: findings optionally published via the existing Redis events, and Export Center bundles include masked evidence plus rule metadata.

CLI, policy, docs

Add stella secrets scan [path|image] plus --secrets flag on stella scan to run the analyzer inline. CLI output redacts payloads, shows rule IDs, severity, and remediation hints.
Policy Engine ingests secret.leak evidence, including ruleId, confidence, masking.applied, enabling predicates like secret.leak.highConfidence, secret.leak.ruleDisabled. Templates cover severities, approvals, and ticket automation.
Documentation updates: scanner module dossier (new analyzer), policy cookbook (rule management), and Offline Kit guide (bundling rule updates).

Testing, telemetry, rollout

Rule-pack regression tests ensure deterministic matching and masking; analyzer unit tests cover regex + entropy combos, while integration tests run across sample repositories and OCI layers.
Metrics: scanner.secrets.ruleset.version, scanner.secrets.findings.total, scanner.secrets.findings.high_confidence. Logs include rule ID, masked hash, and file digests for auditing.
Offline Kit delivers the signed ruleset catalog, upgrade guide, and policy defaults so fully air-gapped tenants can keep pace without internet access.

18 KiB Raw Blame History Unescape Escape

Sprint 137 - Scanner & Surface

Implementation progress (2025-11-09)

Design outcomes

SCANNER-ENG-0002 — Node.js lockfile collector + CLI validator

SCANNER-ENG-0003 — Python lockfile + editable-install parity

SCANNER-ENG-0004 — Java/Gradle/SBT lockfile ingestion & validation

SCANNER-ENG-0005 — Go stripped-binary fallback inference

SCANNER-ENG-0006 — Rust fingerprint coverage expansion

SCANNER-ENG-0007 — Deterministic secret leak detection pipeline

18 KiB

Raw Blame History