# Concelier Advisory Ingestion Model **Version:** 1.0 **Date:** 2025-11-29 **Status:** Canonical This advisory defines the product rationale, ingestion semantics, and implementation strategy for the Concelier module, covering the Link-Not-Merge model, connector pipelines, observation storage, and deterministic exports. --- ## 1. Executive Summary Concelier is the **advisory ingestion engine** that acquires, normalizes, and correlates vulnerability advisories from authoritative sources. Key capabilities: - **Aggregation-Only Contract** - No derived semantics in ingestion - **Link-Not-Merge** - Observations correlated, never merged - **Multi-Source Connectors** - Vendor PSIRTs, distros, OSS ecosystems - **Deterministic Exports** - Reproducible JSON, Trivy DB bundles - **Conflict Detection** - Structured payloads for divergent claims --- ## 2. Market Drivers ### 2.1 Target Segments | Segment | Ingestion Requirements | Use Case | |---------|------------------------|----------| | **Security Teams** | Authoritative data | Accurate vulnerability assessment | | **Compliance** | Provenance tracking | Audit trail for advisory sources | | **DevSecOps** | Fast updates | CI/CD pipeline integration | | **Air-Gap Ops** | Offline bundles | Disconnected environment support | ### 2.2 Competitive Positioning Most vulnerability databases merge data, losing provenance. Stella Ops differentiates with: - **Link-Not-Merge** preserving all source claims - **Conflict visibility** showing where sources disagree - **Deterministic exports** enabling reproducible builds - **Multi-format support** (CSAF, OSV, GHSA, vendor-specific) - **Signature verification** for upstream integrity --- ## 3. Aggregation-Only Contract (AOC) ### 3.1 Core Principles The AOC ensures ingestion purity: 1. **No derived semantics** - No severity consensus, merged status, or fix hints 2. **Immutable raw docs** - Append-only with version chains 3. **Mandatory provenance** - Source, timestamp, signature status 4. **Linkset only** - Joins stored separately, never mutate content 5. **Deterministic canonicalization** - Stable JSON output 6. **Idempotent upserts** - Same hash = no new record 7. **CI verification** - AOCVerifier enforces at runtime ### 3.2 Enforcement ```csharp // AOCWriteGuard checks before every write public class AOCWriteGuard { Task GuardAsync(AdvisoryObservation obs) { // Verify no forbidden properties // Validate provenance completeness // Check tenant claims // Normalize timestamps // Compute content hash } } ``` Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors at build time to prevent forbidden property usage. --- ## 4. Advisory Observation Model ### 4.1 Observation Structure ```json { "_id": "tenant:vendor:upstreamId:revision", "tenant": "acme-corp", "source": { "vendor": "OSV", "stream": "github", "api": "https://api.osv.dev/v1/.../GHSA-...", "collectorVersion": "concelier/1.7.3" }, "upstream": { "upstreamId": "GHSA-xxxx-....", "documentVersion": "2025-09-01T12:13:14Z", "fetchedAt": "2025-09-01T13:04:05Z", "receivedAt": "2025-09-01T13:04:06Z", "contentHash": "sha256:...", "signature": { "present": true, "format": "dsse", "keyId": "rekor:.../key/abc" } }, "content": { "format": "OSV", "specVersion": "1.6", "raw": { /* unmodified upstream document */ } }, "identifiers": { "primary": "GHSA-xxxx-....", "aliases": ["CVE-2025-12345", "GHSA-xxxx-...."] }, "linkset": { "purls": ["pkg:npm/lodash@4.17.21"], "cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"], "references": [ {"type": "advisory", "url": "https://..."}, {"type": "fix", "url": "https://..."} ] }, "supersedes": "tenant:vendor:upstreamId:prev-revision", "createdAt": "2025-09-01T13:04:06Z" } ``` ### 4.2 Linkset Correlation ```json { "_id": "sha256:...", "tenant": "acme-corp", "key": { "vulnerabilityId": "CVE-2025-12345", "productKey": "pkg:npm/lodash@4.17.21", "confidence": "high" }, "observations": [ { "observationId": "tenant:osv:GHSA-...:v1", "sourceVendor": "OSV", "statement": { "severity": "high" }, "collectedAt": "2025-09-01T13:04:06Z" }, { "observationId": "tenant:nvd:CVE-2025-12345:v2", "sourceVendor": "NVD", "statement": { "severity": "critical" }, "collectedAt": "2025-09-01T14:00:00Z" } ], "conflicts": [ { "conflictId": "sha256:...", "type": "severity-mismatch", "observations": [ { "source": "OSV", "value": "high" }, { "source": "NVD", "value": "critical" } ], "confidence": "medium", "detectedAt": "2025-09-01T14:00:01Z" } ] } ``` --- ## 5. Source Connectors ### 5.1 Source Families | Family | Examples | Format | |--------|----------|--------| | **Vendor PSIRTs** | Microsoft, Oracle, Cisco, Adobe | CSAF, proprietary | | **Linux Distros** | Red Hat, SUSE, Ubuntu, Debian, Alpine | CSAF, JSON, XML | | **OSS Ecosystems** | OSV, GHSA, npm, PyPI, Maven | OSV, GraphQL | | **CERTs** | CISA (KEV), JVN, CERT-FR | JSON, XML | ### 5.2 Connector Contract ```csharp public interface IFeedConnector { string SourceName { get; } // Fetch signed feeds or offline mirrors Task FetchAsync(IServiceProvider sp, CancellationToken ct); // Normalize to strongly-typed DTOs Task ParseAsync(IServiceProvider sp, CancellationToken ct); // Build canonical records with provenance Task MapAsync(IServiceProvider sp, CancellationToken ct); } ``` ### 5.3 Connector Lifecycle 1. **Snapshot** - Fetch with cursor, ETag, rate limiting 2. **Parse** - Schema validation, normalization 3. **Guard** - AOCWriteGuard enforcement 4. **Write** - Append-only insert 5. **Event** - Emit `advisory.observation.updated` --- ## 6. Version Semantics ### 6.1 Ecosystem Normalization | Ecosystem | Format | Normalization | |-----------|--------|---------------| | npm, PyPI, Maven | SemVer | Intervals with `<`, `>=`, `~`, `^` | | RPM | EVR | `epoch:version-release` with order keys | | DEB | dpkg | Version comparison with order keys | | APK | Alpine | Computed order keys | ### 6.2 CVSS Handling - Normalize CVSS v2/v3/v4 where available - Track all source CVSS values - Effective severity = max (configurable) - Store KEV evidence with source and date --- ## 7. Conflict Detection ### 7.1 Conflict Types | Type | Description | Resolution | |------|-------------|------------| | `severity-mismatch` | Different severity ratings | Policy decides | | `affected-range-divergence` | Different version ranges | Most specific wins | | `reference-clash` | Contradictory references | Surface all | | `alias-inconsistency` | Different alias mappings | Union with provenance | | `metadata-gap` | Missing information | Flag for review | ### 7.2 Conflict Visibility Conflicts are never hidden - they are: - Stored in linkset documents - Surfaced in API responses - Included in exports - Displayed in Console UI --- ## 8. Deterministic Exports ### 8.1 JSON Export ``` exports/json/ ├── CVE/ │ ├── 20/ │ │ └── CVE-2025-12345.json │ └── ... ├── manifest.json └── export-digest.sha256 ``` - Deterministic folder structure - Canonical JSON (sorted keys, stable timestamps) - Manifest with SHA-256 per file - Reproducible across runs ### 8.2 Trivy DB Export ``` exports/trivy/ ├── db.tar.gz ├── metadata.json └── manifest.json ``` - Bolt DB compatible with Trivy - Full and delta modes - ORAS push to registries - Mirror manifests for domains ### 8.3 Export Determinism Running the same export against the same data must produce: - Identical file contents - Identical manifest hashes - Identical export digests --- ## 9. Implementation Strategy ### 9.1 Phase 1: Core Pipeline (Complete) - [x] AOCWriteGuard implementation - [x] Observation storage - [x] Basic connectors (Red Hat, SUSE, OSV) - [x] JSON export ### 9.2 Phase 2: Link-Not-Merge (Complete) - [x] Linkset correlation engine - [x] Conflict detection - [x] Event emission - [x] API surface ### 9.3 Phase 3: Expanded Sources (In Progress) - [x] GHSA GraphQL connector - [x] Debian DSA connector - [ ] Alpine secdb connector (CONCELIER-CONN-50-001) - [ ] CISA KEV enrichment (CONCELIER-KEV-51-001) ### 9.4 Phase 4: Export Enhancements (Planned) - [ ] Delta Trivy DB exports - [ ] ORAS registry push - [ ] Attestation hand-off - [ ] Mirror bundle signing --- ## 10. API Surface ### 10.1 Sources & Jobs | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/concelier/sources` | GET | `concelier.read` | List sources | | `/api/v1/concelier/sources/{name}/trigger` | POST | `concelier.admin` | Trigger fetch | | `/api/v1/concelier/sources/{name}/pause` | POST | `concelier.admin` | Pause source | | `/api/v1/concelier/jobs/{id}` | GET | `concelier.read` | Job status | ### 10.2 Exports | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/concelier/exports/json` | POST | `concelier.export` | Trigger JSON export | | `/api/v1/concelier/exports/trivy` | POST | `concelier.export` | Trigger Trivy export | | `/api/v1/concelier/exports/{id}` | GET | `concelier.read` | Export status | ### 10.3 Search | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/concelier/advisories/{key}` | GET | `concelier.read` | Get advisory | | `/api/v1/concelier/observations/{id}` | GET | `concelier.read` | Get observation | | `/api/v1/concelier/linksets` | GET | `concelier.read` | Query linksets | --- ## 11. Storage Model ### 11.1 Collections | Collection | Purpose | Key Indexes | |------------|---------|-------------| | `sources` | Connector catalog | `{_id}` | | `source_state` | Run state | `{sourceName}` | | `documents` | Raw payloads | `{sourceName, uri}` | | `advisory_observations` | Normalized records | `{tenant, upstream.upstreamId}` | | `advisory_linksets` | Correlations | `{tenant, key.vulnerabilityId, key.productKey}` | | `advisory_events` | Change log | `{type, occurredAt}` | | `export_state` | Export cursors | `{exportKind}` | ### 11.2 GridFS Buckets - `fs.documents` - Raw payloads (immutable) - `fs.exports` - Historical archives --- ## 12. Event Model ### 12.1 Events | Event | Trigger | Content | |-------|---------|---------| | `advisory.observation.updated@1` | New/superseded observation | IDs, hash, supersedes | | `advisory.linkset.updated@1` | Correlation change | Deltas, conflicts | ### 12.2 Event Transport - Primary: NATS - Fallback: Redis Stream - Offline Kit captures for replay --- ## 13. Observability ### 13.1 Metrics - `concelier.fetch.docs_total{source}` - `concelier.fetch.bytes_total{source}` - `concelier.parse.failures_total{source}` - `concelier.observations.write_total{result}` - `concelier.linksets.updated_total{result}` - `concelier.linksets.conflicts_total{type}` - `concelier.export.duration_seconds{kind}` ### 13.2 Performance Targets | Operation | Target | |-----------|--------| | Ingest throughput | 5k docs/min | | Observation write | < 5ms p95 | | Linkset build | < 15ms p95 | | Export (1M advisories) | < 90 seconds | --- ## 14. Security Considerations ### 14.1 Outbound Security - Allowlist per connector (domains, protocols) - Proxy support with TLS pinning - Rate limiting per source ### 14.2 Signature Verification - PGP/cosign/x509 verification stored - Failed verification flagged, not rejected - Policy can down-weight unsigned sources ### 14.3 Determinism - Canonical JSON writer - Stable export digests - Reproducible across runs --- ## 15. Related Documentation | Resource | Location | |----------|----------| | Concelier architecture | `docs/modules/concelier/architecture.md` | | Link-Not-Merge schema | `docs/modules/concelier/link-not-merge-schema.md` | | Event schemas | `docs/modules/concelier/events/` | | Attestation guide | `docs/modules/concelier/attestation.md` | --- ## 16. Sprint Mapping - **Primary Sprint:** SPRINT_0115_0001_0004_concelier_iv.md - **Related Sprints:** - SPRINT_0113_0001_0002_concelier_ii.md - SPRINT_0114_0001_0003_concelier_iii.md **Key Task IDs:** - `CONCELIER-AOC-40-001` - AOC enforcement (DONE) - `CONCELIER-LNM-41-001` - Link-Not-Merge (DONE) - `CONCELIER-CONN-50-001` - Alpine connector (IN PROGRESS) - `CONCELIER-KEV-51-001` - KEV enrichment (TODO) - `CONCELIER-EXPORT-55-001` - Delta exports (TODO) --- ## 17. Success Metrics | Metric | Target | |--------|--------| | Advisory freshness | < 1 hour from source | | Ingestion accuracy | 100% provenance retention | | Export determinism | 100% hash reproducibility | | Conflict detection | 100% of source divergence | | Source coverage | 20+ authoritative sources | --- *Last updated: 2025-11-29*