Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added RuntimeFactsNdjsonReader for reading NDJSON formatted runtime facts. - Introduced IRuntimeFactsIngestionService interface and its implementation. - Enhanced Program.cs to register new services and endpoints for runtime facts. - Updated CallgraphIngestionService to include CAS URI in stored artifacts. - Created RuntimeFactsValidationException for validation errors during ingestion. - Added tests for RuntimeFactsIngestionService and RuntimeFactsNdjsonReader. - Implemented SignalsSealedModeMonitor for compliance checks in sealed mode. - Updated project dependencies for testing utilities.
18 KiB
18 KiB
Sprint 137 - Scanner & Surface
Phase focus: Scanner.VIII — Analyzer gap design & readiness.
- Depends on: Sprint 136 · Scanner.VII (Surface env/fs/secrets) to ensure shared primitives exist.
- Feeds: Sprint 138 (Ruby parity) and Sprint 139 (language-specific analyzers) by locking designs + policy hooks.
| Task ID | State | Summary | Owner / Source | Depends On |
|---|---|---|---|---|
SCANNER-ENG-0002 |
DONE (2025-11-09) | Design the Node.js lockfile collector + CLI validator per docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md, capturing Surface + policy requirements before implementation. |
Scanner Guild, CLI Guild (docs/modules/scanner) | — |
SCANNER-ENG-0003 |
DONE (2025-11-09) | Design Python lockfile + editable-install parity checks with policy predicates and CLI workflow coverage as outlined in the gap analysis. | Python Analyzer Guild, CLI Guild (docs/modules/scanner) | — |
SCANNER-ENG-0004 |
DONE (2025-11-09) | Design Java lockfile ingestion/validation (Gradle/SBT collectors, CLI verb, policy hooks) to close comparison gaps. | Java Analyzer Guild, CLI Guild (docs/modules/scanner) | — |
SCANNER-ENG-0005 |
DONE (2025-11-09) | Enhance Go stripped-binary fallback inference design, including inferred module metadata + policy integration, per the gap analysis. | Go Analyzer Guild (docs/modules/scanner) | — |
SCANNER-ENG-0006 |
DONE (2025-11-09) | Expand Rust fingerprint coverage design (enriched fingerprint catalogue + policy controls) per the comparison matrix. | Rust Analyzer Guild (docs/modules/scanner) | — |
SCANNER-ENG-0007 |
DONE (2025-11-09) | Design the deterministic secret leak detection pipeline covering rule packaging, Policy Engine integration, and CLI workflow. | Scanner Guild, Policy Guild (docs/modules/scanner) | — |
2025-11-09: The gap designs below capture analyzer, Surface, CLI, and policy contracts for SCANNER-ENG-0002…0007; tasks were taken DOING → DONE after this review.
Implementation progress (2025-11-09)
- Gradle/Maven lock ingestion is now wired into
JavaLanguageAnalyzer:JavaLockFileCollectorsorts lock metadata deterministically, merges it with archive findings (lockConfiguration,lockRepository,lockResolved), and emits declared-only components (withdeclaredOnly=true,lockSource,lockLocator) whenever jars are missing. CLI/Surface telemetry tags were updated to carry per-language declared/missing counters. stella java lock-validateshares theHandleLanguageLockValidateAsynchelper with Node/Python, has table/JSON output parity, and is documented alongside the scanner README + CLI guide (including the new metricstellaops.cli.java.lock_validate.count). Tests now cover the Ruby/Node/Java lock workflows end-to-end viaCommandHandlersTests.
Design outcomes
SCANNER-ENG-0002 — Node.js lockfile collector + CLI validator
Scope & goals
- Provide deterministic ingestion of
pnpm-lock.yaml,package-lock.json, andyarn.lockso declared dependencies are preserved even whennode_modulesis absent. - Offer a CLI validator that runs without scheduling a scan, reusing the same collector and Surface safety rails.
Design decisions
- Add
NodeLockfileCollectorunderStellaOps.Scanner.Analyzers.Lang.Node. The collector normalises manifests into a shared model (package name,version,resolved,integrity,registry,workspace path) and emitsDeclaredOnly = truecomponents stored beside installed fragments (LayerComponentFragment.DeclaredSources). - Reuse
LanguageAnalyzerContextmerge rules so installed packages supersede declared-only entries while retaining discrepancies for policy. - Gate execution through
Surface.Validation(scanner.lockfiles.node.*knobs) that enforce max lockfile size, workspace limits, and registry allowlists; violations fail fast with deterministic error IDs. - Private registries referenced in lockfiles must use
secret://handles.Surface.Secretsresolves these handles before validation and the resolved metadata (never the secret) is attached to the collector context for auditing. - EntryTrace usage hints annotate runtime packages; when a package is used at runtime but missing from the lockfile, the merge step tags it with
UsageWithoutDeclaration.
CLI, policy, docs
- Add
stella node lock-validate [path] --format {auto|pnpm|npm|yarn}that runs locally, reuses Surface controls, and returns canonical JSON + table summaries. The CLI inherits--surface-configso air-gapped configs stay consistent. - Scanner/WebService gains
--node-lockfiles/SCANNER__NODE__LOCKFILES__ENABLEDtoggles to control ingestion during full scans. - Policy Engine receives predicates:
node.lock.declaredMissing,node.lock.registryDisallowed,node.lock.declarationOnly. Templates show how to fail on disallowed registries while only warning on declared-only findings that never reach runtime. - Update
docs/modules/scanner/architecture.mdand policy DSL appendices with the new evidence flags and CLI workflow.
Testing, telemetry, rollout
- Golden fixtures for pnpm v8, npm v9, and yarn berry lockfiles live under
tests/Scanner.Analyzers.Node/__fixtures__/lockfiles. Deterministic snapshots are asserted in both analyzer and CLI tests. - Add integration coverage in
tests/Scanner.Cli.Nodeverifying exit codes and explain output for mismatched packages/registries. - Emit counters (
scanner.node.lock.declared,scanner.node.lock.mismatch,scanner.node.lock.registry_blocked) plus structured logs keyed by lockfile digest. - Offline Kit ships the parser tables and CLI binary help under
offline/scanner/node-lockfiles/README.md.
Implementation status (2025-11-09)
- Lockfile declarations now emit
DeclaredOnlycomponents inStellaOps.Scanner.Analyzers.Lang.Nodewith lock source/locator metadata and deterministic evidence for policy use. - CLI verb
stella node lock-validateinspects lockfiles locally, rendering declared-only/missing-lock summaries and emittingstellaops.cli.node.lock_validate.counttelemetry. - Node analyzer determinism fixtures updated with declared-only coverage; CLI unit suite exercises the new handler.
- Python analyzer ingests
requirements*.txt,Pipfile.lock, andpoetry.lock, tagging installed distributions withlockSourcemetadata and creating declared-only components.stella python lock-validatemirrors the workflow for offline validation and recordsstellaops.cli.python.lock_validate.count.
SCANNER-ENG-0003 — Python lockfile + editable-install parity
Scope & goals
- Parse Python lockfiles (
poetry.lock,Pipfile.lock, hashedrequirements*.txt) to capture declared graphs pre-install. - Detect editable installs and local path references so policy can assert parity between lockfiles and runtime contents.
Design decisions
- Introduce
PythonLockfileCollectorinStellaOps.Scanner.Analyzers.Lang.Python, capable of reading Poetry, Pipenv, pip-tools, and raw requirements syntax (including environment markers, extras, hashes, VCS refs). - Extend the collector with an
EditableResolverthat inspects lockfile entries (path =,editable = true,-e ./pkg) and consultsSurface.FSto normalise the referenced directory, capturingEditablePath,SourceDigest, andVcsRefmetadata. - Merge results with installed
*.dist-infodata usingLanguageAnalyzerContext. Installed evidence overrides declared-only components; editable packages missing from the artifact layer are taggedEditableMissing. Surface.Validationadds knobsscanner.lockfiles.python.maxBytes,scanner.lockfiles.python.allowedIndexes, and ensures hashes are present when policy mandates repeatable environments. Private index credentials are provided viaSurface.Secretsand never persisted.
CLI, policy, docs
- New CLI verb
stella python lock-validatemirrors the Node workflow, validates editable references resolve within the checked-out tree, and emits parity diagnostics. - Scanner runs accept
--python-lockfilesto toggle ingestion per tenant. - Policy predicates:
python.lock.declaredMissing,python.lock.editableUnpinned,python.lock.indexDisallowed. Editable packages missing from the filesystem can be set to fail builds or raise waivers. - Document the workflow in
docs/modules/scanner/architecture.mdand the policy cookbook, including guidance on handling build-system backends.
Testing, telemetry, rollout
- Fixtures covering Poetry 1.6, Pipenv 2024.x,
requirements.txtwith markers, and mixed editable/VCS entries live beside the analyzer tests. - CLI golden output asserts deterministic ordering and masking of secrets in URLs.
- Metrics:
scanner.python.lock.declared,scanner.python.lock.editable,scanner.python.lock.failures. - Offline Kit bundles include parser definitions and sample policies to keep air-gapped tenants aligned.
SCANNER-ENG-0004 — Java/Gradle/SBT lockfile ingestion & validation
Scope & goals
- Capture Gradle, Maven, and SBT dependency locks before artifacts are built, along with repository provenance and configuration scopes.
- Provide CLI validation and policy predicates enforcing repository allowlists and declared/runtime parity.
Design decisions
- Add collectors:
GradleLockfileCollector(readsgradle.lockfileandgradle/dependency-locks/*.lock),MavenLockfileCollector(parsespom.xml/pom.lock+ dependencyManagement overrides), andSbtLockfileCollector(reads Ivy resolution outputs ordependencies.lock). - Each collector emits normalized records keyed by
groupId:artifactId:versionplus config scope (compileClasspath,runtimeClasspath, etc.), repository URI, checksum, and optional classifier. Records are stored asDeclaredOnlyfragments associated with their workspace path. Surface.Validationenforces file-size limits, repository allowlists (scanner.lockfiles.java.allowedRepos), and optional checksum requirements. Private Maven credentials flow throughSurface.Secrets.JavaLanguageAnalyzermerges declared entries with installed archives. Runtime usage from EntryTrace is attached so policies can prioritize gaps that reach runtime.
CLI, policy, docs
- CLI verb
stella java lock-validatesupports Gradle/Maven/SBT modes, prints mismatched dependencies, and checks repository policy. - Scanner flags
--java-lockfilesor envSCANNER__JAVA__LOCKFILES__ENABLEDgate ingestion. Lockfile artifacts are uploaded to Surface.FS for evidence replay. - Policy predicates:
java.lock.declaredMissing,java.lock.repoDisallowed,java.lock.unpinned(no checksum). Explain traces cite repository + config scope for each discrepancy. - Docs: update scanner module dossier and policy template library with repository governance examples.
Testing, telemetry, rollout
- Fixtures derived from sample Gradle multi-projects, Maven BOM hierarchies, and SBT builds validate parser coverage and CLI messaging.
- Metrics
scanner.java.lock.declared,scanner.java.lock.missing,scanner.java.lock.repo_blockedfeed the observability dashboards. - Offline kits include parser grammars and CLI docs so air-gapped tenants can enforce repo policies without SaaS dependencies.
SCANNER-ENG-0005 — Go stripped-binary fallback inference
Scope & goals
- Enrich the stripped-binary fallback so Go modules remain explainable even without embedded
buildinfo, and give Policy Engine knobs to treat inferred evidence differently.
Design decisions
- Extend
GoBinaryScannerwith an inference pipeline that, when build info is absent, parses ELF/Mach-O symbol tables and DWARF data using the existingElfSharpbindings. Symbols feed into a newGoSymbolInferenceEnginethat matches against a signedGoFingerprintCatalogunderStellaOps.Scanner.Analyzers.Lang.Go.Fingerprints. - Inferred results carry
Confidence(0–1), matched symbol counts, and reasons (BuildInfoMissing,SymbolMatches,PkgPathFallback). Records are emitted asInferredModulemetadata alongside hashed fallback components. - Update fragment schemas so DSSE-composed BOMs include both the hashed fallback and the inference summary, enabling deterministic replay.
Surface.Validationexposesscanner.analyzers.go.fallback.enabled,scanner.analyzers.go.fallback.maxSymbolBytes, ensuring workloads can opt out or constrain processing time.
Policy, CLI, docs
- Policy predicates
go.module.inferenceConfidenceandgo.module.hashOnlylet tenants fail when only hashed provenance exists or warn when inference confidence < threshold. - CLI flag
--go-fallback-detail(and corresponding API query) prints hashed vs inferred modules, confidence, and remediation hints (e.g., rebuild with-buildvcs). - Documentation updates cover inference details, how confidence feeds lattice weights, and how to author waivers.
Testing, telemetry, rollout
- Add stripped binary fixtures (Linux, macOS) plus intentionally obfuscated samples. Tests assert deterministic inference and hashing.
- Metrics
scanner.go.inference.count,scanner.go.inference.confidence_bucketensure observability; logs includeimageDigest,binaryPath,confidence. - Offline Kit bundles the fingerprint catalog and inference changelog so air-gapped tenants can audit provenance.
SCANNER-ENG-0006 — Rust fingerprint coverage expansion
Scope & goals
- Improve Rust evidence for stripped binaries by expanding fingerprint sources, symbol parsing, and policy controls over heuristic findings.
Design decisions
- Build a new
RustFingerprintCatalogsigned and versioned, fed by Cargo crate metadata, community hash contributions, and curated fingerprints from StellaOps scans. Catalog lives underStellaOps.Scanner.Analyzers.Lang.Rust.Fingerprintswith deterministic ordering. - Extend
RustAnalyzerCollectorwith symbol parsing (DWARF, ELF build IDs) viaSymbolGraphResolver. Resolver correlates crate sections, monomorphized symbol prefixes, and#[panic_handler]markers to infer crate names and versions. - Emit inference metadata (
fingerprintId,confidence,symbolEvidence[]) alongside hashed fallbacks. Authoritative Cargo.lock data (when present) still wins in merges. Surface.Validationadds toggles for fingerprint freshness and maximum catalog size per tenant. Offline bundles deliver catalog updates signed via DSSE.
Policy, CLI, docs
- Policy predicates:
rust.fingerprint.confidence,rust.fingerprint.catalogAgeDays. Templates show how to warn when only heuristic data exists, or fail if catalog updates are stale. - CLI flag
--rust-fingerprint-detailprints authoritative vs inferred crates, symbol samples, and guidance. - Documentation (scanner module + policy guide) explains how inference is stored, how catalog publishing works, and how to tune policy weights.
Testing, telemetry, rollout
- Add fixtures for stripped Rust binaries across editions (2018–2024) and with/without LTO. Determinism tests compare catalog revisions and inference outputs.
- Metrics
scanner.rust.fingerprint.authoritative,scanner.rust.fingerprint.inferred,scanner.rust.fingerprint.catalog_versionfeed dashboards and alerts. - Offline kit updates include catalog packages, verification instructions, and waiver templates tied to predicate names.
SCANNER-ENG-0007 — Deterministic secret leak detection pipeline
Scope & goals
- Provide first-party secret leak detection that matches competitor capabilities while preserving deterministic, offline-friendly execution and explainability.
Design decisions
- Introduce
StellaOps.Scanner.Analyzers.Secrets, a restart-time plug-in that consumes rule bundles (ruleset.tgz) signed with DSSE and versioned (semantic version + hash). Bundles live underplugins/scanner/secrets/rules/<version>. - Rule bundles contain deterministic regex/entropy definitions, context windows, and masking directives. A rule index is generated at build time to guarantee deterministic ordering.
- Analyzer executes after Surface validation of each file/layer. Files pass through a streaming matcher that outputs
SecretLeakEvidence(rule id, severity, confidence, file path, byte ranges, masking applied). Findings persist inScanAnalysisStoreand align with DSSE exports. Surface.Validationintroducesscanner.secrets.rules.bundle,scanner.secrets.maxFileBytes, andscanner.secrets.targetGlobs.Surface.Secretssupplies allowlist tokens (e.g., approved test keys) without exposing plaintext to analyzers.- Events/attestations: findings optionally published via the existing Redis events, and Export Center bundles include masked evidence plus rule metadata.
CLI, policy, docs
- Add
stella secrets scan [path|image]plus--secretsflag onstella scanto run the analyzer inline. CLI output redacts payloads, shows rule IDs, severity, and remediation hints. - Policy Engine ingests
secret.leakevidence, includingruleId,confidence,masking.applied, enabling predicates likesecret.leak.highConfidence,secret.leak.ruleDisabled. Templates cover severities, approvals, and ticket automation. - Documentation updates: scanner module dossier (new analyzer), policy cookbook (rule management), and Offline Kit guide (bundling rule updates).
Testing, telemetry, rollout
- Rule-pack regression tests ensure deterministic matching and masking; analyzer unit tests cover regex + entropy combos, while integration tests run across sample repositories and OCI layers.
- Metrics:
scanner.secrets.ruleset.version,scanner.secrets.findings.total,scanner.secrets.findings.high_confidence. Logs include rule ID, masked hash, and file digests for auditing. - Offline Kit delivers the signed ruleset catalog, upgrade guide, and policy defaults so fully air-gapped tenants can keep pace without internet access.