Add unit tests and implementations for MongoDB index models and OpenAPI metadata

- Implemented `MongoIndexModelTests` to verify index models for various stores. - Created `OpenApiMetadataFactory` with methods to generate OpenAPI metadata. - Added tests for `OpenApiMetadataFactory` to ensure expected defaults and URL overrides. - Introduced `ObserverSurfaceSecrets` and `WebhookSurfaceSecrets` for managing secrets. - Developed `RuntimeSurfaceFsClient` and `WebhookSurfaceFsClient` for manifest retrieval. - Added dependency injection tests for `SurfaceEnvironmentRegistration` in both Observer and Webhook contexts. - Implemented tests for secret resolution in `ObserverSurfaceSecretsTests` and `WebhookSurfaceSecretsTests`. - Created `EnsureLinkNotMergeCollectionsMigrationTests` to validate MongoDB migration logic. - Added project files for MongoDB tests and NuGet package mirroring.
2025-11-17 21:21:56 +02:00
parent d3128aec24
commit 9075bad2d9
146 changed files with 152183 additions and 82 deletions
--- a/docs/modules/cli/guides/cli-reference.md
+++ b/docs/modules/cli/guides/cli-reference.md
@@ -519,3 +519,27 @@ The Attestor response prints verification status, Rekor UUID (when available), a
 ---

 *Last updated: 2025-11-05 (Sprint 101).*
+
+## 3 · `stella scan entrytrace --stream-ndjson`
+
+### 3.1 Synopsis
+```bash
+stella scan entrytrace \
+  --scan-id <scanId> \
+  [--stream-ndjson] \
+  [--include-ndjson] \
+  [--verbose]
+```
+
+### 3.2 Description
+Streams the EntryTrace NDJSON produced by a completed scan. When `--stream-ndjson` is set the CLI sends `Accept: application/x-ndjson` and writes the raw lines to stdout in order, suitable for piping into AOC/ETL tools. Without the flag, the command returns the JSON envelope (`scanId`, `imageDigest`, graph, NDJSON array) and optionally prints NDJSON when `--include-ndjson` is set.
+
+### 3.3 Examples
+- Stream raw NDJSON for further processing:
+  ```bash
+  stella scan entrytrace --scan-id scan-123 --stream-ndjson > entrytrace.ndjson
+  ```
+- Retrieve JSON envelope (default behaviour):
+  ```bash
+  stella scan entrytrace --scan-id scan-123
+  ```
--- a/docs/modules/concelier/link-not-merge-schema.md
+++ b/docs/modules/concelier/link-not-merge-schema.md
@@ -0,0 +1,125 @@
+# Link-Not-Merge (LNM) Observation & Linkset Schema
+
+_Draft for approval — authored 2025-11-16 to unblock CONCELIER-LNM tracks._
+
+## Goals
+- Immutable storage of raw advisory observations per source/tenant.
+- Deterministic linksets built from observations without merging or mutating originals.
+- Stable across online/offline deployments; replayable from raw inputs.
+
+## Observation document (Mongo JSON Schema excerpt)
+```json
+{
+  "bsonType": "object",
+  "required": ["_id","tenantId","source","advisoryId","affected","provenance","ingestedAt"],
+  "properties": {
+    "_id": {"bsonType": "objectId"},
+    "tenantId": {"bsonType": "string"},
+    "source": {"bsonType": "string", "description": "Adapter id, e.g., ghsa, nvd, cert-bund"},
+    "advisoryId": {"bsonType": "string"},
+    "title": {"bsonType": "string"},
+    "summary": {"bsonType": "string"},
+    "severities": {
+      "bsonType": "array",
+      "items": {"bsonType": "object", "required": ["system","score"],
+        "properties": {"system":{"bsonType":"string"},"score":{"bsonType":"double"},"vector":{"bsonType":"string"}}}
+    },
+    "affected": {
+      "bsonType": "array",
+      "items": {"bsonType":"object","required":["purl"],
+        "properties": {
+          "purl": {"bsonType":"string"},
+          "package": {"bsonType":"string"},
+          "versions": {"bsonType":"array","items":{"bsonType":"string"}},
+          "ranges": {"bsonType":"array","items":{"bsonType":"object",
+            "required":["type","events"],
+            "properties": {"type":{"bsonType":"string"},"events":{"bsonType":"array","items":{"bsonType":"object"}}}}},
+          "ecosystem": {"bsonType":"string"},
+          "cpe": {"bsonType":"array","items":{"bsonType":"string"}},
+          "cpes": {"bsonType":"array","items":{"bsonType":"string"}}
+        }
+      }
+    },
+    "references": {"bsonType": "array", "items": {"bsonType":"string"}},
+    "weaknesses": {"bsonType":"array","items":{"bsonType":"string"}},
+    "published": {"bsonType": "date"},
+    "modified": {"bsonType": "date"},
+    "provenance": {
+      "bsonType": "object",
+      "required": ["sourceArtifactSha","fetchedAt"],
+      "properties": {
+        "sourceArtifactSha": {"bsonType":"string"},
+        "fetchedAt": {"bsonType":"date"},
+        "ingestJobId": {"bsonType":"string"},
+        "signature": {"bsonType":"object"}
+      }
+    },
+    "ingestedAt": {"bsonType": "date"}
+  }
+}
+```
+
+### Observation invariants
+- **Immutable:** no in-place updates; new revision → new document with `supersedesId` optional pointer.
+- **Deterministic keying:** `_id` derived from `hash(tenantId|source|advisoryId|provenance.sourceArtifactSha)` to keep inserts idempotent in replay.
+- **Normalization guardrails:** version ranges must be stored as raw-from-source; no inferred merges.
+
+## Linkset document
+```json
+{
+  "bsonType":"object",
+  "required":["_id","tenantId","advisoryId","source","observations","createdAt"],
+  "properties":{
+    "_id":{"bsonType":"objectId"},
+    "tenantId":{"bsonType":"string"},
+    "advisoryId":{"bsonType":"string"},
+    "source":{"bsonType":"string"},
+    "observations":{"bsonType":"array","items":{"bsonType":"objectId"}},
+    "normalized": {
+      "bsonType":"object",
+      "properties":{
+        "purls":{"bsonType":"array","items":{"bsonType":"string"}},
+        "versions":{"bsonType":"array","items":{"bsonType":"string"}},
+        "ranges": {"bsonType":"array","items":{"bsonType":"object"}},
+        "severities": {"bsonType":"array","items":{"bsonType":"object"}}
+      }
+    },
+    "createdAt":{"bsonType":"date"},
+    "builtByJobId":{"bsonType":"string"},
+    "provenance": {"bsonType":"object","properties":{
+      "observationHashes":{"bsonType":"array","items":{"bsonType":"string"}},
+      "toolVersion" : {"bsonType":"string"},
+      "policyHash" : {"bsonType":"string"}
+    }}
+  }
+}
+```
+
+### Linkset invariants
+- Built from a set of observation IDs; never overwrites observations.
+- Carries the hash list of source observations for audit/replay.
+- Deterministic sort: observations sorted by `source, advisoryId, fetchedAt` before hashing.
+
+## Indexes (Mongo)
+- Observations: `{ tenantId:1, source:1, advisoryId:1, provenance.fetchedAt:-1 }` (compound for ingest); `{ provenance.sourceArtifactSha:1 }` unique to avoid dup writes.
+- Linksets: `{ tenantId:1, advisoryId:1, source:1 }` unique; `{ observations:1 }` sparse for reverse lookups.
+
+## Collections
+- `advisory_observations` — raw per-source docs (immutable).
+- `advisory_linksets` — derived normalized aggregates with observation pointers and hashes.
+
+## Determinism & replay
+- Replay rebuild: order observations by fetchedAt, recompute linkset hash list, ensure byte-identical linkset JSON.
+- All timestamps UTC ISO-8601; no server-local time.
+- String normalization: lowercase `source`, trim/normalize PURLs, stable sort arrays.
+
+## Sample documents
+See `docs/samples/lnm/observation-ghsa.json` and `docs/samples/lnm/linkset-ghsa.json` (added with this draft) for concrete payloads.
+
+## Approval path
+1) Architecture + Concelier Core review this document.
+2) If accepted, freeze JSON Schema and roll into `src/Concelier/__Libraries/StellaOps.Concelier.Storage.Mongo` migrations.
+3) Update consumers (policy/CLI/export) to read from linksets only; deprecate Merge endpoints.
+
+---
+Tracking: CONCELIER-LNM-21-001/002/101; Sprint 110 blockers (Concelier/Excititor waves).
--- a/docs/modules/excititor/operations/evidence-api.md
+++ b/docs/modules/excititor/operations/evidence-api.md
@@ -0,0 +1,66 @@
+# Excititor Advisory-AI evidence APIs (projection + chunks)
+
+> Covers the read-only evidence surfaces shipped in Sprints 119–120: `/v1/vex/observations/{vulnerabilityId}/{productKey}` and `/v1/vex/evidence/chunks`.
+
+## Scope and determinism
+
+- **Aggregation-only**: no consensus, severity merging, or reachability. Responses carry raw statements plus provenance/signature metadata.
+- **Stable ordering**: both endpoints sort by `lastSeen` DESC; pagination uses a deterministic `limit`.
+- **Limits**: observation projection default `limit=200`, max `500`; chunk stream default `limit=500`, max `2000`.
+- **Tenancy**: reads respect `X-Stella-Tenant` when provided; otherwise fall back to `DefaultTenant` configuration.
+- **Auth**: bearer token with `vex.read` scope required.
+
+## `/v1/vex/observations/{vulnerabilityId}/{productKey}`
+
+- **Response**: JSON object with `vulnerabilityId`, `productKey`, `generatedAt`, `totalCount`, `truncated`, `statements[]`.
+- **Statement fields**: `observationId`, `providerId`, `status`, `justification`, `detail`, `firstSeen`, `lastSeen`, `scope{key,name,version,purl,cpe,componentIdentifiers[]}`, `anchors[]`, `document{digest,format,revision,sourceUri}`, `signature{type,keyId,issuer,verifiedAt}`.
+- **Filters**:
+  - `providerId` (multi-valued, comma-separated)
+  - `status` (values in `VexClaimStatus`)
+  - `since` (ISO-8601, UTC)
+  - `limit` (ints within bounds)
+- **Mapping back to storage**:
+  - `observationId` = `{providerId}:{document.digest}`
+  - `document.digest` locates the raw record in `vex_raw`.
+  - `anchors` contain JSON pointers/paragraph locators from source metadata.
+
+Headers:
+- `Excititor-Results-Truncated: true|false`
+- `Excititor-Results-Total: <int>`
+
+## `/v1/vex/evidence/chunks`
+
+- **Query params**: `vulnerabilityId` (required), `productKey` (required), optional `providerId`, `status`, `since`, `limit`.
+- **Response**: **NDJSON** stream; each line is a `VexEvidenceChunkResponse`.
+- **Chunk fields**: `observationId`, `linksetId`, `vulnerabilityId`, `productKey`, `providerId`, `status`, `justification`, `detail`, `scopeScore` (from confidence or signals), `firstSeen`, `lastSeen`, `scope{...}`, `document{digest,format,sourceUri,revision}`, `signature{type,subject,issuer,keyId,verifiedAt,transparencyRef}`, `metadata` (flattened additionalMetadata).
+- **Headers**: same truncation/total headers as projection API.
+- **Streaming guidance (SDK/clients)**:
+  - Use HTTP client that supports response streaming; read line-by-line and JSON-deserialize per line.
+  - Treat stream as unbounded list up to `limit`; do not assume array brackets.
+  - Back-off or paginate by adjusting `since` or narrowing providers/statuses.
+
+## `/v1/vex/attestations/{attestationId}`
+
+- **Purpose**: Lookup attestation provenance (supplier ↔ observation/linkset ↔ product/vulnerability) without touching consensus.
+- **Response**: `VexAttestationPayload` with fields:
+  - `attestationId`, `supplierId`, `observationId`, `linksetId`, `vulnerabilityId`, `productKey`, `justificationSummary`, `issuedAt`, `metadata{}`.
+- **Semantics**:
+  - `attestationId` matches the export/attestation ID used when signing (Resolve/Worker flows).
+  - `observationId`/`linksetId` map back to evidence identifiers; clients can stitch provenance for citations.
+- **Auth**: `vex.read` scope; tenant header optional (payloads are tenant-agnostic).
+
+## Error model
+
+- Standard API envelope with `ValidationProblem` for missing required params.
+- `scope` failures return `403` with problem details.
+- Tenancy parse failures return `400`.
+
+## Backwards compatibility
+
+- No legacy routes are deprecated by these endpoints; they are additive and remain aggregation-only.
+
+## References
+
+- Implementation: `src/Excititor/StellaOps.Excititor.WebService/Program.cs` (`/v1/vex/observations/**`, `/v1/vex/evidence/chunks`).
+- Telemetry: `src/Excititor/StellaOps.Excititor.WebService/Telemetry/EvidenceTelemetry.cs` (`excititor.vex.observation.*`, `excititor.vex.chunks.*`).
+- Data model: `src/Excititor/StellaOps.Excititor.WebService/Contracts/VexObservationContracts.cs`, `Contracts/VexEvidenceChunkContracts.cs`.
--- a/docs/modules/scanner/operations/entrytrace-cadence.md
+++ b/docs/modules/scanner/operations/entrytrace-cadence.md
@@ -0,0 +1,40 @@
+# EntryTrace Heuristic Review Cadence
+
+EntryTrace heuristics must stay aligned with competitor techniques and new runtime behaviours. This cadence makes updates predictable and deterministic.
+
+## Objectives
+- Refresh shell/launcher heuristics quarterly using the latest gap analysis in `docs/benchmarks/scanner/scanning-gaps-stella-misses-from-competitors.md`.
+- Re-run explain-trace fixtures to confirm deterministic outputs and document any newly unsupported constructs.
+- Ensure operator-facing explainability stays in sync with emitted diagnostics and metrics.
+
+## Cadence
+- **Frequency:** Quarterly (Jan, Apr, Jul, Oct) or sooner when critical regressions are discovered.
+- **Owners:** EntryTrace Guild with QA Guild pairing.
+- **Inputs:** Gap benchmark doc, new runtime samples from support channels, and anonymised customer repros (when permitted).
+- **Outputs:**
+  - Updated heuristics/diagnostics in `StellaOps.Scanner.EntryTrace` with deterministic fixtures.
+  - Changelog entry in `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/TASKS.md`.
+  - Sprint log updates under the active `SPRINT_0138_0000_0001_scanner_ruby_parity.md` when cadence items land.
+
+## Workflow
+1) **Collect & triage signals**
+   - Parse new gaps from the benchmark doc; map each to an EntryTrace detector area (shell parser, interpreter tracer, PATH resolver).
+   - Classify as _coverage gap_, _precision issue_, or _observability gap_.
+2) **Fixture-first update**
+   - Add/extend fixtures in `StellaOps.Scanner.EntryTrace.Tests/Fixtures` before modifying code.
+   - Use deterministic serializers to keep fixture outputs byte-stable.
+3) **Implement & validate**
+   - Update analyzers/diagnostics; run `dotnet test src/Scanner/__Tests/StellaOps.Scanner.EntryTrace.Tests/StellaOps.Scanner.EntryTrace.Tests.csproj --nologo --verbosity minimal`.
+   - Confirm metrics counters (`entrytrace_*`) and explain-trace text stay consistent.
+4) **Record explainability**
+   - Update explain-trace catalog (diagnostic enum descriptions) when new reasons are introduced.
+   - Add operator notes to sprint log if remediation guidance changes.
+5) **Publish**
+   - Attach a brief summary to the sprint Execution Log and to `TASKS.md` with date + scope.
+
+## Fail-safe & rollback
+- Keep previous fixture baselines; if a heuristic widens too far, revert to prior fixture sets to restore determinism.
+- Prefer additive diagnostics over behavioural regressions; when behaviour must change, document it in the sprint log and `TASKS.md`.
+
+## Ownership transitions
+- If the cadence cannot run on schedule, mark the relevant sprint task `BLOCKED` with the reason and hand off to the Project Manager to re-staff before the next window.