feat: Enhance MongoDB storage with event publishing and outbox support

- Added `MongoAdvisoryObservationEventPublisher` and `NatsAdvisoryObservationEventPublisher` for event publishing.
- Registered `IAdvisoryObservationEventPublisher` to choose between NATS and MongoDB based on configuration.
- Introduced `MongoAdvisoryObservationEventOutbox` for outbox pattern implementation.
- Updated service collection to include new event publishers and outbox.
- Added a new hosted service `AdvisoryObservationTransportWorker` for processing events.

feat: Update project dependencies

- Added `NATS.Client.Core` package to the project for NATS integration.

test: Add unit tests for AdvisoryLinkset normalization

- Created `AdvisoryLinksetNormalizationConfidenceTests` to validate confidence score calculations.

fix: Adjust confidence assertion in `AdvisoryObservationAggregationTests`

- Updated confidence assertion to allow a range instead of a fixed value.

test: Implement tests for AdvisoryObservationEventFactory

- Added `AdvisoryObservationEventFactoryTests` to ensure correct mapping and hashing of observation events.

chore: Configure test project for Findings Ledger

- Created `Directory.Build.props` for test project configuration.
- Added `StellaOps.Findings.Ledger.Exports.Unit.csproj` for unit tests related to findings ledger exports.

feat: Implement export contracts for findings ledger

- Defined export request and response contracts in `ExportContracts.cs`.
- Created various export item records for findings, VEX, advisories, and SBOMs.

feat: Add export functionality to Findings Ledger Web Service

- Implemented endpoints for exporting findings, VEX, advisories, and SBOMs.
- Integrated `ExportQueryService` for handling export logic and pagination.

test: Add tests for Node language analyzer phase 22

- Implemented `NodePhase22SampleLoaderTests` to validate loading of NDJSON fixtures.
- Created sample NDJSON file for testing.

chore: Set up isolated test environment for Node tests

- Added `node-isolated.runsettings` for isolated test execution.
- Created `node-tests-isolated.sh` script for running tests in isolation.
This commit is contained in:
master
2025-11-20 23:08:45 +02:00
parent f0e74d2ee8
commit 2e276d6676
49 changed files with 1996 additions and 113 deletions

View File

@@ -0,0 +1,27 @@
{
"eventId": "8c5e9d4e-54c0-4fb3-9e0c-7c4cdbf74c6a",
"tenantId": "urn:tenant:123e4567-e89b-12d3-a456-426614174000",
"observationId": "6560606df3c5d6ad3b5a1234",
"advisoryId": "CVE-2024-99999",
"source": {
"vendor": "ghsa",
"stream": "advisories",
"api": "https://api.github.com/advisories",
"collectorVersion": "1.12.0"
},
"linksetSummary": {
"aliases": ["GHSA-xxxx-yyyy-zzzz", "CVE-2024-99999"],
"purls": ["pkg:npm/lodash@4.17.21"],
"cpes": ["cpe:/a:lodash:lodash:4.17.21"],
"scopes": ["runtime"],
"relationships": [
{"type": "contains", "source": "pkg:npm/lodash@4.17.21", "target": "file://dist/lodash.js", "provenance": "ghsa"}
]
},
"supersedesId": "65605fdaf3c5d6ad3b5a0fff",
"documentSha": "2f8f568cc1ed3474f0a4564ddb8c64f4b4d176fbe0a2a98a02b88e822a4f5b6d",
"observationHash": "10f4fc0b5c1a1d4c266fafd2b4f45618f6a0a4b86087c3e67e4c1a2c8f38e990",
"ingestedAt": "2025-11-20T14:35:12Z",
"traceId": "trace-4f29d7f6f1f147da",
"replayCursor": "cs-0000000172-0001"
}

View File

@@ -0,0 +1,44 @@
# advisory.observation.updated@1 · Event contract
Purpose: unblock CONCELIER-GRAPH-21-002 by freezing the platform event shape for observation changes emitted by Concelier. This is the only supported event for observation churn; downstreams subscribe for evidence fan-out and replay bundles.
## Envelope & transport
- Subject: `concelier.advisory.observation.updated.v1`
- Type/version: `advisory.observation.updated@1`
- Transport: NATS (primary), Redis Stream `concelier:advisory.observation.updated:v1` (fallback). Both carry the same DSSE envelope.
- DSSE payloadType: `application/vnd.stellaops.advisory.observation.updated.v1+json`.
- Signature: Ed25519 via Platform Events signer; attach Rekor UUID when available. Offline kits treat the envelope as the source of truth.
## Payload (JSON)
| Field | Type | Rules |
| --- | --- | --- |
| `eventId` | string (uuid) | Generated by publisher; idempotency key.
| `tenantId` | string | `urn:tenant:{uuid}`; required for multi-tenant routing.
| `observationId` | string (ObjectId) | Mongo `_id` of the observation document.
| `advisoryId` | string | Upstream advisory identifier (e.g., CVE, GHSA, vendor id).
| `source` | object | `{ vendor, stream, api, collectorVersion }`; lowercase vendor, non-empty.
| `linksetSummary` | object | `{ aliases: string[], purls: string[], cpes?: string[], scopes?: string[], relationships?: object[] }` all arrays pre-sorted ASCII.
| `supersedesId` | string (ObjectId, optional) | Previous observation `_id` if this is a new revision; omitted otherwise.
| `documentSha` | string | SHA-256 of raw upstream document.
| `observationHash` | string | Stable hash over canonicalized observation JSON (tenant, source, advisoryId, documentSha, fetchedAt).
| `ingestedAt` | string (ISO-8601 UTC) | Timestamp when appended.
| `traceId` | string (optional) | Propagated from ingest job/request; aids join with logs/metrics.
| `replayCursor` | string | Monotone cursor for offline bundle ordering (tick from change stream resume token).
### Determinism & ordering
- Arrays sorted ASCII; objects field-sorted when hashing.
- `eventId` + `replayCursor` provide exactly-once consumer handling; duplicates must be ignored when `observationHash` unchanged.
- No judgments: only raw facts and hash pointers; any derived severity/merge content is forbidden.
### Error contracts for Scheduler
- Retryable NATS/Redis failures use backoff capped at 30s; after 5 attempts, emit `concelier.events.dlq` with the same envelope and `error` field describing transport failure.
- Consumers must NACK on schema validation failure; publisher logs `ERR_EVENT_SCHEMA` and quarantines the offending observation id.
## Sample payload
See `advisory.observation.updated@1.sample.json` (canonical field order, ASCII sorted arrays). Hashes intentionally short for readability; replace with real values in tests.
## Schema
`advisory.observation.updated@1.schema.json` provides a JSON Schema (draft 2020-12) for runtime validation; any additional fields are rejected.
## Compatibility note
Sprint tracker referenced `sbom.observation.updated`; this contract standardises on `advisory.observation.updated@1`. If a legacy alias is required for interim consumers, mirror the envelope on subject `surface.sbom.observation.updated.v1` with identical payload.

View File

@@ -0,0 +1,27 @@
{
"eventId": "8c5e9d4e-54c0-4fb3-9e0c-7c4cdbf74c6a",
"tenantId": "urn:tenant:123e4567-e89b-12d3-a456-426614174000",
"observationId": "6560606df3c5d6ad3b5a1234",
"advisoryId": "CVE-2024-99999",
"source": {
"vendor": "ghsa",
"stream": "advisories",
"api": "https://api.github.com/advisories",
"collectorVersion": "1.12.0"
},
"linksetSummary": {
"aliases": ["GHSA-xxxx-yyyy-zzzz", "CVE-2024-99999"],
"purls": ["pkg:npm/lodash@4.17.21"],
"cpes": ["cpe:/a:lodash:lodash:4.17.21"],
"scopes": ["runtime"],
"relationships": [
{"type": "contains", "source": "pkg:npm/lodash@4.17.21", "target": "file://dist/lodash.js", "provenance": "ghsa"}
]
},
"supersedesId": "65605fdaf3c5d6ad3b5a0fff",
"documentSha": "2f8f568cc1ed3474f0a4564ddb8c64f4b4d176fbe0a2a98a02b88e822a4f5b6d",
"observationHash": "10f4fc0b5c1a1d4c266fafd2b4f45618f6a0a4b86087c3e67e4c1a2c8f38e990",
"ingestedAt": "2025-11-20T14:35:12Z",
"traceId": "trace-4f29d7f6f1f147da",
"replayCursor": "cs-0000000172-0001"
}

View File

@@ -0,0 +1,68 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://stellaops.org/concelier/advisory.observation.updated@1.schema.json",
"title": "advisory.observation.updated@1",
"type": "object",
"required": [
"eventId",
"tenantId",
"observationId",
"advisoryId",
"source",
"linksetSummary",
"documentSha",
"observationHash",
"ingestedAt",
"replayCursor"
],
"additionalProperties": false,
"properties": {
"eventId": { "type": "string", "format": "uuid" },
"tenantId": { "type": "string", "pattern": "^urn:tenant:[0-9a-fA-F-]{36}$" },
"observationId": { "type": "string", "pattern": "^[a-f0-9]{24}$" },
"advisoryId": { "type": "string", "minLength": 1 },
"source": {
"type": "object",
"required": ["vendor", "stream", "api", "collectorVersion"],
"additionalProperties": false,
"properties": {
"vendor": { "type": "string", "minLength": 1 },
"stream": { "type": "string", "minLength": 1 },
"api": { "type": "string", "minLength": 1 },
"collectorVersion": { "type": "string", "minLength": 1 }
}
},
"linksetSummary": {
"type": "object",
"required": ["aliases", "purls"],
"additionalProperties": false,
"properties": {
"aliases": { "type": "array", "items": { "type": "string" }, "uniqueItems": true },
"purls": { "type": "array", "items": { "type": "string" }, "uniqueItems": true },
"cpes": { "type": "array", "items": { "type": "string" }, "uniqueItems": true },
"scopes": { "type": "array", "items": { "type": "string" }, "uniqueItems": true },
"relationships": {
"type": "array",
"items": {
"type": "object",
"required": ["type", "source", "target"],
"additionalProperties": false,
"properties": {
"type": { "type": "string" },
"source": { "type": "string" },
"target": { "type": "string" },
"provenance": { "type": "string" }
}
},
"uniqueItems": false
}
}
},
"supersedesId": { "type": "string", "pattern": "^[a-f0-9]{24}$" },
"documentSha": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"observationHash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"ingestedAt": { "type": "string", "format": "date-time" },
"traceId": { "type": "string" },
"replayCursor": { "type": "string", "minLength": 1 }
}
}

View File

@@ -0,0 +1,56 @@
# CONCELIER-LNM-21-002 · Linkset correlation rules (v1)
Purpose: unblock CONCELIER-LNM-21-002 by freezing correlation/precedence rules and providing fixtures so builders and downstream consumers can proceed.
## Scope
- Applies to linksets produced from `advisory_observations` (LNM v1).
- Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
- Output persists in `advisory_linksets` and drives `advisory.linkset.updated@1` events.
## Deterministic confidence calculation (01)
```
confidence = clamp(
0.40 * alias_score +
0.25 * purl_overlap_score +
0.15 * cpe_overlap_score +
0.10 * severity_agreement +
0.05 * reference_overlap +
0.05 * freshness_score
)
```
- `alias_score`: 1 if any alias exact-match across observations; 0.5 if vendor ID prefixes match; else 0.
- `purl_overlap_score`: 1 if same pkg+version range intersects; 0.6 if same pkg family but disjoint ranges; 0 otherwise. Use semver/rpm/deb comparers as in LNM v1.
- `cpe_overlap_score`: 1 if any CPE exact-match; 0.5 if same vendor/product, any version; else 0.
- `severity_agreement`: 1 if CVSS base score delta ≤ 0.1; 0.5 if ≤ 1.0; else 0. Use max of available CVSS per observation.
- `reference_overlap`: fraction of shared reference URLs (case-normalized) between the pair with the highest overlap across the set.
- `freshness_score`: 1 when `fetchedAt` spread ≤ 48h; linearly decays to 0 at 14 days.
- Sort observations before scoring by `(source.vendor, advisoryId, fetchedAt)`; reuse that order for hashing and for output arrays.
## Conflict emission (add-only)
Emit a conflict entry per divergent field group:
- `severity-mismatch`: CVSS base score delta > 1.0 or vector differs.
- `affected-range-divergence`: version ranges do not intersect.
- `reference-clash`: no shared references and source vendors differ.
- `alias-inconsistency`: aliases disjoint across observations.
- `metadata-gap`: required fields missing on any observation.
Each conflict includes `field`, `reason`, and `values` (array of `source: value` strings) and is stable-sorted by `field` then `reason`.
## Linkset output shape additions
- `key.confidence`: populated from formula above.
- `conflicts[]`: as defined; may be empty but never null.
- `normalized` retains add-only fields from `link-not-merge-schema.md`; do not drop raw ranges even when disjoint.
- `provenance.hashes`: sorted list of `observationHash` values; used by replay bundles.
## Fixtures
- `docs/samples/lnm/linkset-lnm-21-002-sample.json`: two-source agreement (high confidence, no conflicts).
- `docs/samples/lnm/linkset-lnm-21-002-conflict.json`: three-source disagreement showing conflict records and confidence < 0.7.
All fixtures use ASCII ordering and ISO-8601 UTC timestamps and may be used as golden outputs in tests.
## Implementation checklist
- Builder must refuse to overwrite existing linkset when incoming hash list unchanged.
- Correlation job idempotency key: `hash(tenantId|aliasSet|purlSet|fetchedAtBucket)`.
- Telemetry: counter `concelier.linkset.builder.conflict_total{field,reason}` and histogram `concelier.linkset.builder.confidence` (01 buckets).
- Event emission: include `confidence` and `conflicts` summary in `advisory.linkset.updated@1`; keep arrays sorted as above.
## Change control
- Add-only. Adjusting weights or conflict codes requires new version `advisory.linkset.updated@2` and a sprint note.

View File

@@ -0,0 +1,31 @@
# Observation Event Transport (advisory.observation.updated@1)
Purpose: document how to emit `advisory.observation.updated@1` events via Mongo outbox with optional NATS JetStream transport.
## Configuration (appsettings.yaml / config)
```yaml
advisoryObservationEvents:
enabled: false # set true to publish beyond Mongo outbox
transport: "mongo" # "mongo" (no-op publisher) or "nats"
natsUrl: "nats://127.0.0.1:4222"
subject: "concelier.advisory.observation.updated.v1"
deadLetterSubject: "concelier.advisory.observation.updated.dead.v1"
stream: "CONCELIER_OBS"
```
Defaults: disabled, transport `mongo`; subject/stream as above.
## Flow
1) Observation sink writes event to `advisory_observation_events` (idempotent on `observationHash`).
2) Background worker dequeues unpublished rows, publishes via configured transport, then stamps `publishedAt`.
3) If transport disabled/unavailable, outbox accumulates safely; re-enabling resumes publishing.
## Operational notes
- Ensure NATS JetStream is reachable before enabling `transport: nats` to avoid retry noise.
- Stream is auto-created if missing with current subject; size capped at 512 KiB per message.
- Dead-letter subject reserved; not yet wired—keep for future schema validation failures.
- Backlog monitoring: count documents in `advisory_observation_events` with `publishedAt: null`.
## Testing
- Without NATS: leave `enabled=false`; app continues writing outbox only.
- With NATS: run a local `nats-server -js` and set `enabled=true transport=nats`. Verify published messages on subject via `nats sub concelier.advisory.observation.updated.v1`.