Add Ruby language analyzer and related functionality
- Introduced global usings for Ruby analyzer. - Implemented RubyLockData, RubyLockEntry, and RubyLockParser for handling Gemfile.lock files. - Created RubyPackage and RubyPackageCollector to manage Ruby packages and vendor cache. - Developed RubyAnalyzerPlugin and RubyLanguageAnalyzer for analyzing Ruby projects. - Added tests for Ruby language analyzer with sample Gemfile.lock and expected output. - Included necessary project files and references for the Ruby analyzer. - Added third-party licenses for tree-sitter dependencies.
This commit is contained in:
@@ -8,13 +8,20 @@
|
||||
| SCANNER-DOCS-0002 | DONE (2025-11-02) | Docs Guild | Keep scanner benchmark comparisons (Trivy/Grype/Snyk) and deep-dive matrix current with source references. | Coordinate with docs/benchmarks owners |
|
||||
| SCANNER-DOCS-0003 | TODO | Docs Guild, Product Guild | Gather Windows/macOS analyzer demand signals and record findings in `docs/benchmarks/scanner/windows-macos-demand.md`. | Coordinate with Product Marketing & Sales enablement |
|
||||
| SCANNER-ENG-0008 | TODO | EntryTrace Guild, QA Guild | Maintain EntryTrace heuristic cadence per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Include quarterly pattern review + explain trace updates |
|
||||
| SCANNER-ENG-0009 | TODO | Ruby Analyzer Guild | SCANNER-ANALYZERS-RUBY-28-001..012 | Deliver Ruby analyzer parity and observation pipeline per gap doc (lockfiles, runtime graph, policy signals). | Design complete; fixtures published; CLI/Offline docs updated. |
|
||||
| SCANNER-ENG-0009 | DOING (2025-11-02) | Ruby Analyzer Guild | SCANNER-ANALYZERS-RUBY-28-001..012 | Deliver Ruby analyzer parity and observation pipeline per gap doc (lockfiles, runtime graph, policy signals). | Design complete; fixtures published; CLI/Offline docs updated. |
|
||||
| SCANNER-ENG-0010 | TODO | PHP Analyzer Guild | SCANNER-ANALYZERS-PHP-27-001..012 | Ship PHP analyzer pipeline (composer lock, autoload graph, capability signals) to close comparison gaps. | Analyzer + policy integration merged; fixtures + docs aligned. |
|
||||
| SCANNER-ENG-0011 | TODO | Language Analyzer Guild | — | Scope Deno runtime analyzer (lockfile resolver, import graphs) based on competitor techniques. | Design doc approved; backlog split into analyzer/runtime work. |
|
||||
| SCANNER-ENG-0012 | TODO | Language Analyzer Guild | — | Evaluate Dart analyzer requirements (pubspec parsing, AOT artifacts) to restore parity. | Investigation summary + task split filed with Dart guild. |
|
||||
| SCANNER-ENG-0013 | TODO | Swift Analyzer Guild | — | Plan Swift Package Manager coverage (Package.resolved, xcframeworks, runtime hints) with policy hooks. | Design brief approved; backlog seeded with analyzer tasks. |
|
||||
| SCANNER-ENG-0014 | TODO | Runtime Guild, Zastava Guild | — | Align Kubernetes/VM target coverage roadmap between Scanner and Zastava per comparison findings. | Joint roadmap doc approved; cross-guild tasks opened. |
|
||||
| SCANNER-ENG-0015 | TODO | Export Center Guild, Scanner Guild | — | Document DSSE/Rekor operator enablement guidance and rollout levers surfaced in gap analysis. | Playbook drafted; Export Center backlog updated. |
|
||||
| SCANNER-ENG-0016 | DOING (2025-11-02) | Ruby Analyzer Guild (Lockfile Squad) | Implement `RubyLockCollector` and vendor cache ingestion per design §4.1–4.3. | Coordinate fixtures under `fixtures/lang/ruby/lockfiles`; target alpha by Sprint 21. |
|
||||
| SCANNER-ENG-0017 | TODO | Ruby Analyzer Guild (Runtime Squad) | Build runtime require/autoload graph builder with tree-sitter Ruby per design §4.4. | Deliver edges with reason codes and integrate EntryTrace hints. |
|
||||
| SCANNER-ENG-0018 | TODO | Ruby Analyzer Guild (Capability Squad) | Emit Ruby capability and framework surface signals as defined in design §4.5. | Policy predicates prototyped; capability records available in SBOM overlays. |
|
||||
| SCANNER-ENG-0019 | TODO | Ruby Analyzer Guild, CLI Guild | Ship Ruby CLI verbs (`stella ruby inspect|resolve`) and Offline Kit packaging per design §4.6. | CLI commands documented; offline manifest updated; e2e tests pass. |
|
||||
| SCANNER-LIC-0001 | DOING (2025-11-02) | Scanner Guild, Legal Guild | Vet tree-sitter Ruby licensing and Offline Kit packaging requirements. | SPDX review complete; packaging plan approved. |
|
||||
| SCANNER-POLICY-0001 | TODO | Policy Guild, Ruby Analyzer Guild | Define Policy Engine predicates for Ruby groups/capabilities and align lattice weights. | Policy schema merged; tests cover new predicates. |
|
||||
| SCANNER-CLI-0001 | TODO | CLI Guild, Ruby Analyzer Guild | Coordinate CLI UX/help text for new Ruby verbs and update CLI docs. | CLI help + docs updated; golden outputs recorded. |
|
||||
| SCANNER-ENG-0002 | TODO | Scanner Guild, CLI Guild | Design Node.js lockfile collector/CLI validator per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Capture Surface & policy requirements before implementation |
|
||||
| SCANNER-ENG-0003 | TODO | Python Analyzer Guild, CLI Guild | Design Python lockfile/editable install parity checks per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Include policy predicates & CLI story in design |
|
||||
| SCANNER-ENG-0004 | TODO | Java Analyzer Guild, CLI Guild | Design Java lockfile ingestion & validation per `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`. | Cover Gradle/SBT collectors, CLI verb, policy hooks |
|
||||
|
||||
137
docs/modules/scanner/design/ruby-analyzer.md
Normal file
137
docs/modules/scanner/design/ruby-analyzer.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Ruby Analyzer Parity Design (SCANNER-ENG-0009)
|
||||
|
||||
**Status:** Draft • Owner: Ruby Analyzer Guild • Updated: 2025-11-02
|
||||
|
||||
## 1. Goals & Non-Goals
|
||||
- **Goals**
|
||||
- Deterministically catalogue Ruby application dependencies (Gemfile/Gemfile.lock, vendored specs, .gem archives) for container layers and local workspaces.
|
||||
- Build runtime usage graphs (require/require_relative, Zeitwerk autoloads, Rack boot chains, Sidekiq/ActiveJob schedulers).
|
||||
- Emit capability signals (exec/fs/net/serialization, framework fingerprints, job schedulers) consumable by Policy Engine and explain traces.
|
||||
- Provide CLI verbs (`stella ruby inspect`, `stella ruby resolve`) and Offline Kit parity for air-gapped deployments.
|
||||
- **Non-Goals**
|
||||
- Shipping dynamic runtime profilers (log-based or APM) in this iteration.
|
||||
- Implementing UI changes beyond exposing explain traces the Policy/UI guilds already support.
|
||||
|
||||
## 2. Scope & Inputs
|
||||
| Input | Location | Notes |
|
||||
|-------|----------|-------|
|
||||
| Gemfile / Gemfile.lock | Source tree, layer filesystem | Handle multiple apps per repo; honour Bundler groups. |
|
||||
| Vendor bundles (`vendor/bundle`, `.bundle/config`) | Layer filesystem | Needed for offline/built images; avoid double-counting platform-specific gems. |
|
||||
| `.gemspec` files / cached specs | `~/.bundle/cache`, `vendor/cache`, gems in layers | Support deterministic parsing without executing gem metadata. |
|
||||
| Framework configs | `config/application.rb`, `config/routes.rb`, `config/sidekiq.yml`, etc. | Feed framework surface mapper. |
|
||||
| Container metadata | Layer digests via RustFS CAS | Support incremental composition per layer. |
|
||||
|
||||
## 3. High-Level Architecture
|
||||
```
|
||||
┌─────────────────────────┐ ┌────────────────────┐
|
||||
│ Bundler Lock Collector │───────▶│ Package Graph │
|
||||
└─────────────────────────┘ │ Aggregator │
|
||||
└─────────┬──────────┘
|
||||
┌─────────────────────────┐ │
|
||||
│ Gemspec Inspector │───────────────▶│
|
||||
└─────────────────────────┘ │
|
||||
▼
|
||||
┌────────────────────┐
|
||||
┌─────────────────────────┐ │ Runtime Graph │
|
||||
│ Require/Autoload Scan │───────▶│ Builder (Zeitwerk) │
|
||||
└─────────────────────────┘ └─────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ Capability Emitter │
|
||||
└─────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ SBOM Writer │
|
||||
│ + Policy Signals │
|
||||
└────────────────────┘
|
||||
```
|
||||
|
||||
## 4. Detailed Components
|
||||
### 4.1 Bundler Lock Collector
|
||||
- Parse `Gemfile.lock` deterministically (no network) using new `RubyLockCollector` under `StellaOps.Scanner.Analyzers.Lang.Ruby`.
|
||||
- Support alternative manifests (`gems.rb`, `gems.locked`) and workspace overrides.
|
||||
- Emit package nodes with fields: `name`, `version`, `source` (path/git/rubygems), `bundlerGroup[]`, `platform`, `declaredOnly` flag.
|
||||
- Implementation:
|
||||
- Reuse parsing strategy from Trivy (`pkg/fanal/analyzer/language/ruby/bundler`) but port to C# with streaming reader and stable ordering.
|
||||
- Integrate with Surface.Validation to enforce size limits and tenant allowlists for git/path sources.
|
||||
|
||||
### 4.2 Gemspec Inspector
|
||||
- Scan cached specs under `vendor/cache`, `.bundle/cache`, and gem directories to pick up transitive packages when lockfiles missing.
|
||||
- Parse without executing Ruby by using a deterministic DSL subset (similar to Trivy gemspec parser).
|
||||
- Link results to lockfile entries by `<name, version, platform>`; create new records flagged `InferredFromSpec` when lockfile absent.
|
||||
|
||||
### 4.3 Package Aggregator
|
||||
- New orchestrator `RubyPackageAggregator` merges lock and gemspec data with installed gems from container layers (once runtime analyzer ships).
|
||||
- Precedence: Installed > Lockfile > Gemspec.
|
||||
- Deduplicate by package key (name+version+platform) and attach provenance bits for Policy Engine.
|
||||
|
||||
### 4.4 Runtime Graph Builder
|
||||
- Static analysis for `require`, `require_relative`, `autoload`, Zeitwerk conventions, and Rails initialisers.
|
||||
- Implementation phases:
|
||||
1. Parse AST using tree-sitter Ruby embedded under `StellaOps.Scanner.Analyzers.Lang.Ruby.Syntax` with deterministic bindings.
|
||||
2. Generate edges `entrypoint -> file` and `file -> package` with reason codes (`require-static`, `autoload-zeitwerk`, `autoload-const_missing`).
|
||||
3. Identify framework entrypoints (Rails controllers, Rack middleware, Sidekiq workers) via heuristics defined in `SCANNER-ANALYZERS-RUBY-28-*` tasks.
|
||||
- Output merges with EntryTrace usage hints to support runtime filtering in Policy Engine.
|
||||
|
||||
### 4.5 Capability & Surface Signals
|
||||
- Emit evidence documents for:
|
||||
- Process/exec usage (`Kernel.system`, `` `cmd` ``, `Open3`).
|
||||
- Network clients (`Net::HTTP`, `Faraday`, `Redis`, `ActiveRecord::Base.establish_connection`).
|
||||
- Serialization sinks (`Marshal.load`, `YAML.load`, `Oj.load`).
|
||||
- Job schedulers (Sidekiq, Resque, ActiveJob, Whenever, Clockwork) with schedule metadata.
|
||||
- Capability records flow to Policy Engine under `capability.ruby.*` namespaces to allow gating on dangerous constructs.
|
||||
|
||||
### 4.6 CLI & Offline Integration
|
||||
- Add CLI verbs:
|
||||
- `stella ruby inspect <path>` – runs collector locally, outputs JSON summary with provenance.
|
||||
- `stella ruby resolve --image <ref>` – fetches scan artifacts, prints dependency graph grouped by bundler group/platform.
|
||||
- Ship analyzer DLLs and rules in Offline Kit manifest; include autoload/zeitwerk fingerprints and heuristics hashed for determinism.
|
||||
|
||||
## 5. Data Contracts
|
||||
| Artifact | Shape | Consumer |
|
||||
|----------|-------|----------|
|
||||
| `ruby_packages.json` | Array `{id, name, version, source, provenance, groups[], platform}` | SBOM Composer, Policy Engine |
|
||||
| `ruby_runtime_edges.json` | Edges `{from, to, reason, confidence}` | EntryTrace overlay, Policy explain traces |
|
||||
| `ruby_capabilities.json` | Capability `{kind, location, evidenceHash, params}` | Policy Engine (capability predicates) |
|
||||
|
||||
All records follow AOC appender rules (immutable, tenant-scoped) and include `hash`, `layerDigest`, and `timestamp` normalized to UTC ISO-8601.
|
||||
|
||||
## 6. Testing Strategy
|
||||
- **Fixtures**: Extend `fixtures/lang/ruby` with Rails, Sinatra, Sidekiq, Rack, container images (with/without vendor cache).
|
||||
- **Determinism**: Golden snapshots for package lists and capability outputs across repeated runs.
|
||||
- **Integration**: Worker e2e to ensure per-layer aggregation; CLI golden outputs (`stella ruby inspect`).
|
||||
- **Policy**: Unit tests verifying new predicates (`ruby.group`, `ruby.capability.exec`, etc.) in Policy Engine test suite.
|
||||
|
||||
## 7. Rollout Plan & Dependencies
|
||||
1. Implement collectors and aggregators (SCANNER-ANALYZERS-RUBY-28-001..004).
|
||||
2. Add capability analyzer and observations (SCANNER-ANALYZERS-RUBY-28-005..008).
|
||||
3. Wire CLI commands and Offline Kit packaging (SCANNER-ANALYZERS-RUBY-28-011).
|
||||
4. Update docs (DOCS-SCANNER-BENCH-62-009 follow-up) once analyzer alpha ready.
|
||||
|
||||
**Dependencies**
|
||||
- Tree-sitter Ruby grammar inclusion (needs Offline Kit packaging and licensing check).
|
||||
- Policy Engine support for new predicates and capability schemas.
|
||||
- Surface.Validation updates for git/path gem sources and secret resolution.
|
||||
|
||||
## 8. Open Questions
|
||||
- Do we require dynamic runtime logs (e.g., `ActiveSupport::Notifications`) for confidence boosts? (defer to future iteration)
|
||||
- Should we enforce signed gem provenance in MVP? Pending Product decision.
|
||||
- Need alignment with Export Center on Ruby-specific manifest emissions.
|
||||
|
||||
## 9. Licensing & Offline Packaging (SCANNER-LIC-0001)
|
||||
- **License**: tree-sitter core and `tree-sitter-ruby` grammar are MIT licensed (confirmed via upstream LICENSE files retrieved 2025-11-02).
|
||||
- **Obligations**:
|
||||
1. Include both MIT license texts in `/third-party-licenses/` and in Offline Kit manifests.
|
||||
2. Update `NOTICE.md` to acknowledge embedded grammars per company policy.
|
||||
3. Record the grammar commit hashes in build metadata; regenerate generated C/WASM artifacts deterministically.
|
||||
4. Ensure build pipeline uses `tree-sitter-cli` only as a build-time tool (not redistributed) to avoid extra licensing obligations.
|
||||
- **Deliverables**:
|
||||
- SCANNER-LIC-0001 to capture Legal sign-off and update packaging scripts.
|
||||
- Export Center to mirror license files into Offline Kit bundle.
|
||||
|
||||
---
|
||||
*References:*
|
||||
- Trivy: `pkg/fanal/analyzer/language/ruby/bundler`, `pkg/fanal/analyzer/language/ruby/gemspec`
|
||||
- Gap analysis: `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md#ruby-analyzer-parity-trivy-grype-snyk`
|
||||
Reference in New Issue
Block a user