Add Ruby language analyzer and related functionality

- Introduced global usings for Ruby analyzer. - Implemented RubyLockData, RubyLockEntry, and RubyLockParser for handling Gemfile.lock files. - Created RubyPackage and RubyPackageCollector to manage Ruby packages and vendor cache. - Developed RubyAnalyzerPlugin and RubyLanguageAnalyzer for analyzing Ruby projects. - Added tests for Ruby language analyzer with sample Gemfile.lock and expected output. - Included necessary project files and references for the Ruby analyzer. - Added third-party licenses for tree-sitter dependencies.
2025-11-03 01:15:43 +02:00
parent ff0eca3a51
commit bf2bf4b395
88 changed files with 6557 additions and 1568 deletions
--- a/docs/modules/advisory-ai/architecture.md
+++ b/docs/modules/advisory-ai/architecture.md
@@ -1,100 +1,115 @@
-# Advisory AI architecture
-
-> Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides.
-
-## 1) Goals
-
- Summarise advisories/VEX evidence into operator-ready briefs with citations.
- Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data).
- Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups.
- Operate deterministically where possible; cache generated artefacts with digests for audit.
-
-## 2) Pipeline overview
-
-```
-                       +---------------------+
-   Concelier/VEX Lens  |  Evidence Retriever |
-   Policy Engine ----> |  (vector + keyword) | ---> Context Pack (JSON)
-   Zastava runtime     +---------------------+
-                               |
-                               v
-                        +-------------+
-                        | Prompt      |
-                        | Assembler   |
-                        +-------------+
-                               |
-                               v
-                        +-------------+
-                        | Guarded LLM |
-                        | (local/host)|
-                        +-------------+
-                               |
-                               v
-                        +-----------------+
-                        | Citation &     |
-                        | Validation      |
-                        +-----------------+
-                               |
-                               v
-                        +----------------+
-                        | Output cache   |
-                        | (hash, bundle) |
-                        +----------------+
-```
-
-## 3) Retrieval & context
-
- Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs.
- Context packs include:
-  - Advisory raw excerpts with highlighted sections and source URLs.
-  - VEX statements (normalized tuples + trust metadata).
-  - Policy explain traces for the affected finding.
-  - Runtime/impact hints from Zastava (exposure, entrypoints).
-  - Export-ready remediation data (fixed versions, patches).
-
-All context references include `content_hash` and `source_id` enabling verifiable citations.
-
-## 4) Guardrails
-
- Prompt templates enforce structure: summary, conflicts, remediation, references.
- Response validator ensures:
-  - No hallucinated advisories (every fact must map to input context).
-  - Citations follow `[n]` indexing referencing actual sources.
-  - Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes).
- Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged.
-
-## 5) Output persistence
-
- Cached artefacts stored in `advisory_ai_outputs` with fields:
-  - `output_hash` (sha256 of JSON response).
-  - `input_digest` (hash of context pack).
-  - `summary`, `conflicts`, `remediation`, `citations`.
-  - `generated_at`, `model_id`, `profile` (Sovereign/FIPS etc.).
-  - `signatures` (optional DSSE if run in deterministic mode).
- Offline bundle format contains `summary.md`, `citations.json`, `context_manifest.json`, `signatures/`.
-
-## 6) Profiles & sovereignty
-
- **Profiles:** `default`, `fips-local` (FIPS-compliant local model), `gost-local`, `cloud-openai` (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints.
- **CryptoProfile/RootPack integration:** generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements.
-
-## 7) APIs
-
- `POST /v1/advisory-ai/summaries` — generate (or retrieve cached) summary for `{advisoryKey, artifactId, policyVersion}`.
- `POST /v1/advisory-ai/conflicts` — explain conflicting VEX statements with trust ranking.
- `POST /v1/advisory-ai/remediation` — fetch remediation plan with target fix versions, prerequisites, verification steps.
- `GET /v1/advisory-ai/outputs/{hash}` — retrieve cached artefact (used by CLI/Console/Export Center).
-
-All endpoints accept `profile` parameter (default `fips-local`) and return `output_hash`, `input_digest`, and `citations` for verification.
-
-## 8) Observability
-
- Metrics: `advisory_ai_requests_total{profile,type}`, `advisory_ai_latency_seconds`, `advisory_ai_validation_failures_total`.
- Logs: include `output_hash`, `input_digest`, `profile`, `model_id`, `tenant`, `artifacts`. Sensitive context is not logged.
- Traces: spans for retrieval, prompt assembly, model inference, validation, cache write.
-
-## 9) Operational controls
-
- Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`).
- Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
- Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.
+# Advisory AI architecture
+
+> Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides.
+
+## 1) Goals
+
+- Summarise advisories/VEX evidence into operator-ready briefs with citations.
+- Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data).
+- Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups.
+- Operate deterministically where possible; cache generated artefacts with digests for audit.
+
+## 2) Pipeline overview
+
+```
+                       +---------------------+
+   Concelier/VEX Lens  |  Evidence Retriever |
+   Policy Engine ----> |  (vector + keyword) | ---> Context Pack (JSON)
+   Zastava runtime     +---------------------+
+                               |
+                               v
+                        +-------------+
+                        | Prompt      |
+                        | Assembler   |
+                        +-------------+
+                               |
+                               v
+                        +-------------+
+                        | Guarded LLM |
+                        | (local/host)|
+                        +-------------+
+                               |
+                               v
+                        +-----------------+
+                        | Citation &     |
+                        | Validation      |
+                        +-----------------+
+                               |
+                               v
+                        +----------------+
+                        | Output cache   |
+                        | (hash, bundle) |
+                        +----------------+
+```
+
+## 3) Retrieval & context
+
+- Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs.
+- Context packs include:
+  - Advisory raw excerpts with highlighted sections and source URLs.
+  - VEX statements (normalized tuples + trust metadata).
+  - Policy explain traces for the affected finding.
+  - Runtime/impact hints from Zastava (exposure, entrypoints).
+  - Export-ready remediation data (fixed versions, patches).
+- **SBOM context retriever** (AIAI-31-002) hydrates:
+  - Version timelines (first/last observed, status, fix availability).
+  - Dependency paths (runtime vs build/test, deduped by coordinate chain).
+  - Tenant environment flags (prod/stage toggles) with optional blast radius summary.
+  - Service-side clamps: max 500 timeline entries, 200 dependency paths, with client-provided toggles for env/blast data.
+
+Retriever requests and results are trimmed/normalized before hashing; metadata (counts, provenance keys) is returned for downstream guardrails. Unit coverage ensures deterministic ordering and flag handling.
+
+All context references include `content_hash` and `source_id` enabling verifiable citations.
+
+## 4) Guardrails
+
+- Prompt templates enforce structure: summary, conflicts, remediation, references.
+- Response validator ensures:
+  - No hallucinated advisories (every fact must map to input context).
+  - Citations follow `[n]` indexing referencing actual sources.
+  - Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes).
+- Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged.
+
+## 5) Deterministic tooling
+
+- **Version comparators** — offline semantic version + RPM EVR parsers with range evaluators. Supports chained constraints (`>=`, `<=`, `!=`) used by remediation advice and blast radius calcs.
+  - Registered via `AddAdvisoryDeterministicToolset` for reuse across orchestrator, CLI, and services.
+- **Orchestration pipeline** — see `orchestration-pipeline.md` for prerequisites, task breakdown, and cross-guild responsibilities before wiring the execution flows.
+- **Planned extensions** — NEVRA/EVR comparators, ecosystem-specific normalisers, dependency chain scorers (AIAI-31-003 scope).
+- Exposed via internal interfaces to allow orchestrator/toolchain reuse; all helpers stay side-effect free and deterministic for golden testing.
+
+## 6) Output persistence
+
+- Cached artefacts stored in `advisory_ai_outputs` with fields:
+  - `output_hash` (sha256 of JSON response).
+  - `input_digest` (hash of context pack).
+  - `summary`, `conflicts`, `remediation`, `citations`.
+  - `generated_at`, `model_id`, `profile` (Sovereign/FIPS etc.).
+  - `signatures` (optional DSSE if run in deterministic mode).
+- Offline bundle format contains `summary.md`, `citations.json`, `context_manifest.json`, `signatures/`.
+
+## 7) Profiles & sovereignty
+
+- **Profiles:** `default`, `fips-local` (FIPS-compliant local model), `gost-local`, `cloud-openai` (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints.
+- **CryptoProfile/RootPack integration:** generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements.
+
+## 8) APIs
+
+- `POST /v1/advisory-ai/summaries` — generate (or retrieve cached) summary for `{advisoryKey, artifactId, policyVersion}`.
+- `POST /v1/advisory-ai/conflicts` — explain conflicting VEX statements with trust ranking.
+- `POST /v1/advisory-ai/remediation` — fetch remediation plan with target fix versions, prerequisites, verification steps.
+- `GET /v1/advisory-ai/outputs/{hash}` — retrieve cached artefact (used by CLI/Console/Export Center).
+
+All endpoints accept `profile` parameter (default `fips-local`) and return `output_hash`, `input_digest`, and `citations` for verification.
+
+## 9) Observability
+
+- Metrics: `advisory_ai_requests_total{profile,type}`, `advisory_ai_latency_seconds`, `advisory_ai_validation_failures_total`.
+- Logs: include `output_hash`, `input_digest`, `profile`, `model_id`, `tenant`, `artifacts`. Sensitive context is not logged.
+- Traces: spans for retrieval, prompt assembly, model inference, validation, cache write.
+
+## 10) Operational controls
+
+- Feature flags per tenant (`ai.summary.enabled`, `ai.remediation.enabled`).
+- Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
+- Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.
--- a/docs/modules/advisory-ai/orchestration-pipeline.md
+++ b/docs/modules/advisory-ai/orchestration-pipeline.md
@@ -0,0 +1,82 @@
+# Advisory AI Orchestration Pipeline (Planning Notes)
+
+> **Status:** Draft – prerequisite design for AIAI-31-004 integration work.  
+> **Audience:** Advisory AI guild, WebService/Worker guilds, CLI guild, Docs/QA support teams.
+
+## 1. Goal
+
+Wire the deterministic pipeline (Summary / Conflict / Remediation flows) into the Advisory AI service, workers, and CLI with deterministic caching, prompt preparation, and guardrail fallback. This document captures the pre-integration checklist and task breakdown so each guild understands their responsibilities before coding begins.
+
+## 2. Prerequisites
+
+| Area | Requirement | Owner | Status |
+|------|-------------|-------|--------|
+| **Toolset** | Deterministic comparators, dependency analyzer (`IDeterministicToolset`, `AdvisoryPipelineOrchestrator`) | Advisory AI | ✅ landed (AIAI-31-003) |
+| **SBOM context** | Real SBOM context client delivering timelines + dependency paths | SBOM Service Guild | ⏳ pending (AIAI-31-002) |
+| **Prompt artifacts** | Liquid/Handlebars prompt templates for summary/conflict/remediation | Advisory AI Docs Guild | ⏳ authoring needed |
+| **Cache strategy** | Decision on DSSE or hash-only cache entries, TTLs, and eviction policy | Advisory AI + Platform | 🔲 define |
+| **Auth scopes** | Confirm service account scopes for new API endpoints/worker-to-service calls | Authority Guild | 🔲 define |
+
+**Blocking risk:** SBOM client and prompt templates must exist (even stubbed) before the orchestrator can produce stable plans.
+
+## 3. Integration plan (high-level)
+
+1. **Service layer (WebService / Worker)**
+   - Inject `IAdvisoryPipelineOrchestrator` via `AddAdvisoryPipeline`.
+   - Define REST endpoint `POST /v1/advisories/{key}/pipeline/{task}` (task ∈ summary/conflict/remediation).
+   - Worker consumes queue messages (`advisory.pipeline.execute`) -> fetches plan -> executes prompt -> persists output & provenance.
+   - Add metrics: `advisory_pipeline_requests_total`, `advisory_pipeline_plan_cache_hits_total`, `advisory_pipeline_latency_seconds`.
+2. **CLI**
+   - New command `stella advise run <task>` with flags for artifact id, profile, policy version, `--force-refresh`.
+   - Render JSON/Markdown outputs; handle caching hints (print cache key, status).
+3. **Caching / storage**
+   - Choose storage (Mongo collection vs existing DSSE output store).  
+   - Persist `AdvisoryTaskPlan` metadata + generated output keyed by cache key + policy version.
+   - Expose TTL/force-refresh semantics.
+4. **Docs & QA**
+   - Publish API spec (`docs/advisory-ai/api.md`) + CLI docs.
+   - Add golden outputs for deterministic runs; property tests for cache key stability.
+
+## 4. Task Breakdown
+
+### AIAI-31-004A (Service orchestration wiring)
+
+- **Scope:** WebService/Worker injection, REST/queue plumbing, metrics counters, basic cache stub.
+- **Dependencies:** `AddAdvisoryPipeline`, SBOM client stub.
+- **Exit:** API responds with plan metadata + queue message; worker logs execution attempt; metrics emitted.
+
+### AIAI-31-004B (Prompt assembly & cache persistence)
+
+- **Scope:** Implement prompt assembler, connect to guardrails, persist cache entries w/ DSSE metadata.
+- **Dependencies:** Prompt templates, cache storage decision, guardrail interface.
+- **Exit:** Deterministic outputs stored; force-refresh honoured; tests cover prompt assembly + caching.
+
+### AIAI-31-004C (CLI integration & docs)
+
+- **Scope:** CLI command + output renderer, docs updates, CLI tests (golden outputs).
+- **Dependencies:** Service endpoints stable, caching semantics documented.
+- **Exit:** CLI command produces deterministic output, docs updated, smoke tests recorded.
+
+### Supporting tasks (other guilds)
+
+- **AUTH-AIAI-31-004** – Update scopes and DSSE policy (Authority guild).
+- **DOCS-AIAI-31-003** – Publish API documentation, CLI guide updates (Docs guild).
+- **QA-AIAI-31-004** – Golden/properties/perf suite for pipeline (QA guild).
+
+## 5. Acceptance checklist (per task)
+
+| Item | Notes |
+|------|-------|
+| Cache key stability | `AdvisoryPipelineOrchestrator` hash must remain stable under re-run of identical inputs. |
+| Metrics & logging | Request id, cache key, task type, profile, latency; guardrail results logged without sensitive prompt data. |
+| Offline readiness | All prompt templates bundled with Offline Kit; CLI works in air-gapped mode with cached data. |
+| Policy awareness | Plans encode policy version used; outputs reference policy digest for audit. |
+| Testing | Unit tests (plan generation, cache keys, DI), integration (service endpoint, worker, CLI), deterministic golden outputs. |
+
+## 6. Next steps
+
+1. Finalize SBOM context client (AIAI-31-002) and prompt templates.
+2. Create queue schema spec (`docs/modules/advisory-ai/queue-contracts.md`) if not already available.
+3. Schedule cross-guild kickoff to agree on cache store & DSSE policy.
+
+_Last updated: 2025-11-02_