- Introduced global usings for Ruby analyzer. - Implemented RubyLockData, RubyLockEntry, and RubyLockParser for handling Gemfile.lock files. - Created RubyPackage and RubyPackageCollector to manage Ruby packages and vendor cache. - Developed RubyAnalyzerPlugin and RubyLanguageAnalyzer for analyzing Ruby projects. - Added tests for Ruby language analyzer with sample Gemfile.lock and expected output. - Included necessary project files and references for the Ruby analyzer. - Added third-party licenses for tree-sitter dependencies.
		
			
				
	
	
	
		
			9.5 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			9.5 KiB
		
	
	
	
	
	
	
	
Ruby Analyzer Parity Design (SCANNER-ENG-0009)
Status: Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
1. Goals & Non-Goals
- Goals
- Deterministically catalogue Ruby application dependencies (Gemfile/Gemfile.lock, vendored specs, .gem archives) for container layers and local workspaces.
 - Build runtime usage graphs (require/require_relative, Zeitwerk autoloads, Rack boot chains, Sidekiq/ActiveJob schedulers).
 - Emit capability signals (exec/fs/net/serialization, framework fingerprints, job schedulers) consumable by Policy Engine and explain traces.
 - Provide CLI verbs (
stella ruby inspect,stella ruby resolve) and Offline Kit parity for air-gapped deployments. 
 - Non-Goals
- Shipping dynamic runtime profilers (log-based or APM) in this iteration.
 - Implementing UI changes beyond exposing explain traces the Policy/UI guilds already support.
 
 
2. Scope & Inputs
| Input | Location | Notes | 
|---|---|---|
| Gemfile / Gemfile.lock | Source tree, layer filesystem | Handle multiple apps per repo; honour Bundler groups. | 
Vendor bundles (vendor/bundle, .bundle/config) | 
Layer filesystem | Needed for offline/built images; avoid double-counting platform-specific gems. | 
.gemspec files / cached specs | 
~/.bundle/cache, vendor/cache, gems in layers | 
Support deterministic parsing without executing gem metadata. | 
| Framework configs | config/application.rb, config/routes.rb, config/sidekiq.yml, etc. | 
Feed framework surface mapper. | 
| Container metadata | Layer digests via RustFS CAS | Support incremental composition per layer. | 
3. High-Level Architecture
┌─────────────────────────┐        ┌────────────────────┐
│  Bundler Lock Collector │───────▶│  Package Graph     │
└─────────────────────────┘        │  Aggregator        │
                                   └─────────┬──────────┘
┌─────────────────────────┐                │
│  Gemspec Inspector      │───────────────▶│
└─────────────────────────┘                │
                                           ▼
                                   ┌────────────────────┐
┌─────────────────────────┐        │ Runtime Graph      │
│  Require/Autoload Scan  │───────▶│ Builder (Zeitwerk) │
└─────────────────────────┘        └─────────┬──────────┘
                                           │
                                           ▼
                                   ┌────────────────────┐
                                   │ Capability Emitter │
                                   └─────────┬──────────┘
                                           │
                                           ▼
                                   ┌────────────────────┐
                                   │ SBOM Writer        │
                                   │ + Policy Signals   │
                                   └────────────────────┘
4. Detailed Components
4.1 Bundler Lock Collector
- Parse 
Gemfile.lockdeterministically (no network) using newRubyLockCollectorunderStellaOps.Scanner.Analyzers.Lang.Ruby. - Support alternative manifests (
gems.rb,gems.locked) and workspace overrides. - Emit package nodes with fields: 
name,version,source(path/git/rubygems),bundlerGroup[],platform,declaredOnlyflag. - Implementation:
- Reuse parsing strategy from Trivy (
pkg/fanal/analyzer/language/ruby/bundler) but port to C# with streaming reader and stable ordering. - Integrate with Surface.Validation to enforce size limits and tenant allowlists for git/path sources.
 
 - Reuse parsing strategy from Trivy (
 
4.2 Gemspec Inspector
- Scan cached specs under 
vendor/cache,.bundle/cache, and gem directories to pick up transitive packages when lockfiles missing. - Parse without executing Ruby by using a deterministic DSL subset (similar to Trivy gemspec parser).
 - Link results to lockfile entries by 
<name, version, platform>; create new records flaggedInferredFromSpecwhen lockfile absent. 
4.3 Package Aggregator
- New orchestrator 
RubyPackageAggregatormerges lock and gemspec data with installed gems from container layers (once runtime analyzer ships). - Precedence: Installed > Lockfile > Gemspec.
 - Deduplicate by package key (name+version+platform) and attach provenance bits for Policy Engine.
 
4.4 Runtime Graph Builder
- Static analysis for 
require,require_relative,autoload, Zeitwerk conventions, and Rails initialisers. - Implementation phases:
- Parse AST using tree-sitter Ruby embedded under 
StellaOps.Scanner.Analyzers.Lang.Ruby.Syntaxwith deterministic bindings. - Generate edges 
entrypoint -> fileandfile -> packagewith reason codes (require-static,autoload-zeitwerk,autoload-const_missing). - Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in 
SCANNER-ANALYZERS-RUBY-28-*tasks. 
 - Parse AST using tree-sitter Ruby embedded under 
 - Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
 
4.5 Capability & Surface Signals
- Emit evidence documents for:
- Process/exec usage (
Kernel.system,`cmd`,Open3). - Network clients (
Net::HTTP,Faraday,Redis,ActiveRecord::Base.establish_connection). - Serialization sinks (
Marshal.load,YAML.load,Oj.load). - Job schedulers (Sidekiq, Resque, ActiveJob, Whenever, Clockwork) with schedule metadata.
 
 - Process/exec usage (
 - Capability records flow to Policy Engine under 
capability.ruby.*namespaces to allow gating on dangerous constructs. 
4.6 CLI & Offline Integration
- Add CLI verbs:
stella ruby inspect <path>– runs collector locally, outputs JSON summary with provenance.stella ruby resolve --image <ref>– fetches scan artifacts, prints dependency graph grouped by bundler group/platform.
 - Ship analyzer DLLs and rules in Offline Kit manifest; include autoload/zeitwerk fingerprints and heuristics hashed for determinism.
 
5. Data Contracts
| Artifact | Shape | Consumer | 
|---|---|---|
ruby_packages.json | 
Array {id, name, version, source, provenance, groups[], platform} | 
SBOM Composer, Policy Engine | 
ruby_runtime_edges.json | 
Edges {from, to, reason, confidence} | 
EntryTrace overlay, Policy explain traces | 
ruby_capabilities.json | 
Capability {kind, location, evidenceHash, params} | 
Policy Engine (capability predicates) | 
All records follow AOC appender rules (immutable, tenant-scoped) and include hash, layerDigest, and timestamp normalized to UTC ISO-8601.
6. Testing Strategy
- Fixtures: Extend 
fixtures/lang/rubywith Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache). - Determinism: Golden snapshots for package lists and capability outputs across repeated runs.
 - Integration: Worker e2e to ensure per-layer aggregation; CLI golden outputs (
stella ruby inspect). - Policy: Unit tests verifying new predicates (
ruby.group,ruby.capability.exec, etc.) in Policy Engine test suite. 
7. Rollout Plan & Dependencies
- Implement collectors and aggregators (SCANNER-ANALYZERS-RUBY-28-001..004).
 - Add capability analyzer and observations (SCANNER-ANALYZERS-RUBY-28-005..008).
 - Wire CLI commands and Offline Kit packaging (SCANNER-ANALYZERS-RUBY-28-011).
 - Update docs (DOCS-SCANNER-BENCH-62-009 follow-up) once analyzer alpha ready.
 
Dependencies
- Tree-sitter Ruby grammar inclusion (needs Offline Kit packaging and licensing check).
 - Policy Engine support for new predicates and capability schemas.
 - Surface.Validation updates for git/path gem sources and secret resolution.
 
8. Open Questions
- Do we require dynamic runtime logs (e.g., 
ActiveSupport::Notifications) for confidence boosts? (defer to future iteration) - Should we enforce signed gem provenance in MVP? Pending Product decision.
 - Need alignment with Export Center on Ruby-specific manifest emissions.
 
9. Licensing & Offline Packaging (SCANNER-LIC-0001)
- License: tree-sitter core and 
tree-sitter-rubygrammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02). - Obligations:
- Include both MIT license texts in 
/third-party-licenses/and in Offline Kit manifests. - Update 
NOTICE.mdto acknowledge embedded grammars per company policy. - Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
 - Ensure build pipeline uses 
tree-sitter-clionly as a build-time tool (not redistributed) to avoid extra licensing obligations. 
 - Include both MIT license texts in 
 - Deliverables:
- SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
 - Export Center to mirror license files into Offline Kit bundle.
 
 
References:
- Trivy: 
pkg/fanal/analyzer/language/ruby/bundler,pkg/fanal/analyzer/language/ruby/gemspec - Gap analysis: 
docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md#ruby-analyzer-parity-trivy-grype-snyk