- Introduced a new document outlining the inline DSSE provenance for SBOM, VEX, scan, and derived events. - Defined the Mongo schema for event patches, including key fields for provenance and trust verification. - Documented the write path for ingesting provenance metadata and backfilling historical events. - Created CI/CD snippets for uploading DSSE attestations and generating provenance metadata. - Established Mongo indexes for efficient provenance queries and provided query recipes for various use cases. - Outlined policy gates for managing VEX decisions based on provenance verification. - Included UI nudges for displaying provenance information and implementation tasks for future enhancements. --- Implement reachability lattice and scoring model - Developed a comprehensive document detailing the reachability lattice and scoring model. - Defined core types for reachability states, evidence, and mitigations with corresponding C# models. - Established a scoring policy with base score contributions from various evidence classes. - Mapped reachability states to VEX gates and provided a clear overview of evidence sources. - Documented the event graph schema for persisting reachability data in MongoDB. - Outlined the integration of runtime probes for evidence collection and defined a roadmap for future tasks. --- Introduce uncertainty states and entropy scoring - Created a draft document for tracking uncertainty states and their impact on risk scoring. - Defined core uncertainty states with associated entropy values and evidence requirements. - Established a schema for storing uncertainty states alongside findings. - Documented the risk score calculation incorporating uncertainty and its effect on final risk assessments. - Provided policy guidelines for handling uncertainty in decision-making processes. - Outlined UI guidelines for displaying uncertainty information and suggested remediation actions. --- Add Ruby package inventory management - Implemented Ruby package inventory management with corresponding data models and storage mechanisms. - Created C# records for Ruby package inventory, artifacts, provenance, and runtime details. - Developed a repository for managing Ruby package inventory documents in MongoDB. - Implemented a service for storing and retrieving Ruby package inventories. - Added unit tests for the Ruby package inventory store to ensure functionality and data integrity.
11 KiB
Ruby Analyzer Parity Design (SCANNER-ENG-0009)
Status: Implemented • Owner: Ruby Analyzer Guild • Updated: 2025-11-10
1. Goals & Non-Goals
- Goals
- Deterministically catalogue Ruby application dependencies (Gemfile/Gemfile.lock, vendored specs, .gem archives) for container layers and local workspaces.
- Build runtime usage graphs (require/require_relative, Zeitwerk autoloads, Rack boot chains, Sidekiq/ActiveJob schedulers).
- Emit capability signals (exec/fs/net/serialization, framework fingerprints, job schedulers) consumable by Policy Engine and explain traces.
- Provide CLI verbs (
stella ruby inspect,stella ruby resolve) and Offline Kit parity for air-gapped deployments.
- Non-Goals
- Shipping dynamic runtime profilers (log-based or APM) in this iteration.
- Implementing UI changes beyond exposing explain traces the Policy/UI guilds already support.
2. Scope & Inputs
| Input | Location | Notes |
|---|---|---|
| Gemfile / Gemfile.lock | Source tree, layer filesystem | Handle multiple apps per repo; honour Bundler groups. |
Vendor bundles (vendor/bundle, .bundle/config) |
Layer filesystem | Needed for offline/built images; avoid double-counting platform-specific gems. |
.gemspec files / cached specs |
~/.bundle/cache, vendor/cache, gems in layers |
Support deterministic parsing without executing gem metadata. |
| Framework configs | config/application.rb, config/routes.rb, config/sidekiq.yml, etc. |
Feed framework surface mapper. |
| Container metadata | Layer digests via RustFS CAS | Support incremental composition per layer. |
3. High-Level Architecture
┌─────────────────────────┐ ┌────────────────────┐
│ Bundler Lock Collector │───────▶│ Package Graph │
└─────────────────────────┘ │ Aggregator │
└─────────┬──────────┘
┌─────────────────────────┐ │
│ Gemspec Inspector │───────────────▶│
└─────────────────────────┘ │
▼
┌────────────────────┐
┌─────────────────────────┐ │ Runtime Graph │
│ Require/Autoload Scan │───────▶│ Builder (Zeitwerk) │
└─────────────────────────┘ └─────────┬──────────┘
│
▼
┌────────────────────┐
│ Capability Emitter │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ SBOM Writer │
│ + Policy Signals │
└────────────────────┘
4. Detailed Components
4.1 Bundler Lock Collector
- Parse
Gemfile.lockdeterministically (no network) using newRubyLockCollectorunderStellaOps.Scanner.Analyzers.Lang.Ruby. - Support alternative manifests (
gems.rb,gems.locked) and workspace overrides. - Emit package nodes with fields:
name,version,source(path/git/rubygems),bundlerGroup[],platform,declaredOnlyflag. - Implementation:
- Reuse parsing strategy from Trivy (
pkg/fanal/analyzer/language/ruby/bundler) but port to C# with streaming reader and stable ordering. - Integrate with Surface.Validation to enforce size limits and tenant allowlists for git/path sources.
- Reuse parsing strategy from Trivy (
4.2 Gemspec Inspector
- Scan cached specs under
vendor/cache,.bundle/cache, and gem directories to pick up transitive packages when lockfiles missing. - Parse without executing Ruby by using a deterministic DSL subset (similar to Trivy gemspec parser).
- Link results to lockfile entries by
<name, version, platform>; create new records flaggedInferredFromSpecwhen lockfile absent.
4.3 Package Aggregator
- New orchestrator
RubyPackageAggregatormerges lock and gemspec data with installed gems from container layers (once runtime analyzer ships). - Precedence: Installed > Lockfile > Gemspec.
- Deduplicate by package key (name+version+platform) and attach provenance bits for Policy Engine.
4.4 Runtime Graph Builder
- Static analysis for
require,require_relative,autoload, Zeitwerk conventions, and Rails initialisers. - Implementation phases:
- MVP (shipped in Sprint 138): perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit
require*andautoloadstatements, records referencing files, and links back to packages when a matching lock entry exists. - Planned follow-up: integrate tree-sitter Ruby under
StellaOps.Scanner.Analyzers.Lang.Ruby.Syntaxfor full AST coverage (Zeitwerk constants, conditional requires, dynamic module loading). This phase remains tracked under SCANNER-ANALYZERS-RUBY-28-003.
- MVP (shipped in Sprint 138): perform lightweight scanning using deterministic regex patterns scoped to Ruby sources. Captures explicit
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine. Entrypoint detection currently keys off file location plus usage hints; richer framework-aware mapping will accompany the tree-sitter phase.
4.5 Capability & Surface Signals
- Emit evidence documents for:
- Process/exec usage (
Kernel.system,`cmd`,Open3). - Network clients (
Net::HTTP,Faraday,Redis,ActiveRecord::Base.establish_connection). - Serialization sinks (
Marshal.load,YAML.load,Oj.load). - Job schedulers (Sidekiq, Resque, ActiveJob, Whenever, Clockwork) with schedule metadata.
- Process/exec usage (
- Capability records flow to Policy Engine under
capability.ruby.*namespaces to allow gating on dangerous constructs.
4.6 CLI & Offline Integration
- Add CLI verbs:
stella ruby inspect <path>– runs collector locally, outputs JSON summary with provenance.stella ruby resolve --image <ref>– fetches scan artifacts, prints dependency graph grouped by bundler group/platform.
- Ship analyzer DLLs and rules in Offline Kit manifest; include autoload/zeitwerk fingerprints and heuristics hashed for determinism.
5. Data Contracts
| Artifact | Shape | Consumer |
|---|---|---|
ruby_packages.json |
RubyPackageInventory { scanId, imageDigest, generatedAt, packages[] } where each package mirrors {id, name, version, source, provenance, groups[], platform, runtime.*} |
SBOM Composer, Policy Engine |
ruby_packages.json records are persisted in Mongo’s ruby.packages collection via the RubyPackageInventoryStore. Scanner.WebService exposes the same payload through GET /api/scans/{scanId}/ruby-packages so Policy, CLI, and Offline Kit consumers can reuse the canonical inventory without re-running the analyzer. Each document is keyed by scanId and includes the resolved imageDigest plus the UTC timestamp recorded by the Worker.
| ruby_runtime_edges.json | Edges {from, to, reason, confidence} | EntryTrace overlay, Policy explain traces |
| ruby_capabilities.json | Capability {kind, location, evidenceHash, params} | Policy Engine (capability predicates) |
| ruby_observation.json | Summary document (packages, runtime edges, capability flags) | Surface manifest, Policy explain traces |
All records follow AOC appender rules (immutable, tenant-scoped) and include hash, layerDigest, and timestamp normalized to UTC ISO-8601.
6. Testing Strategy
- Fixtures: Extend
fixtures/lang/rubywith Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache). - Fixtures: Added
git-sourcesscenario covering git/path dependencies, bundler groups, and vendor cache evidence for declared-only toggling. - Determinism: Golden snapshots for package lists and capability outputs across repeated runs.
- Integration: Worker e2e to ensure per-layer aggregation; CLI golden outputs (
stella ruby inspect). - Policy: Unit tests verifying new predicates (
ruby.group,ruby.capability.exec, etc.) in Policy Engine test suite.
7. Rollout Plan & Dependencies
- Implement collectors and aggregators (SCANNER-ANALYZERS-RUBY-28-001..004).
- Add capability analyzer and observations (SCANNER-ANALYZERS-RUBY-28-005..008).
- Wire CLI commands and Offline Kit packaging (SCANNER-ANALYZERS-RUBY-28-011).
- Update docs (DOCS-SCANNER-BENCH-62-009 follow-up) once analyzer alpha ready.
Dependencies
- Tree-sitter Ruby grammar inclusion (needs Offline Kit packaging and licensing check).
- Policy Engine support for new predicates and capability schemas.
- Surface.Validation updates for git/path gem sources and secret resolution.
8. Open Questions
- Do we require dynamic runtime logs (e.g.,
ActiveSupport::Notifications) for confidence boosts? (defer to future iteration) - Should we enforce signed gem provenance in MVP? Pending Product decision.
- Need alignment with Export Center on Ruby-specific manifest emissions.
9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- License: tree-sitter core and
tree-sitter-rubygrammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-10). - Obligations:
- Keep MIT license texts in
/third-party-licenses/and ship them with Offline Kits (fulfilled viabuild_offline_kit.pycopying the directory into staging). - Track acknowledgements in
NOTICE.md(completed). - Record grammar provenance in build metadata once native parsers ship; current MVP uses regex-only parsing and does not bundle tree-sitter artifacts yet, so no generated sources are redistributed.
- When tree-sitter integration lands, ensure
tree-sitter-cliremains a build-time tool only.
- Keep MIT license texts in
- Deliverables:
- SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors
third-party-licenses/. - Export centre recipe inherits the copied directory with deterministic hashing.
- SCANNER-LIC-0001 tracks Legal sign-off; Offline Kit packaging now mirrors
References:
- Trivy:
pkg/fanal/analyzer/language/ruby/bundler,pkg/fanal/analyzer/language/ruby/gemspec - Gap analysis:
docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md#ruby-analyzer-parity-trivy-grype-snyk