Files
git.stella-ops.org/docs/product-advisories/28-Nov-2025 - Concelier Advisory Ingestion Model.md
StellaOps Bot 0bef705bcc
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
true the date
2025-11-30 19:23:21 +02:00

13 KiB

Concelier Advisory Ingestion Model

Version: 1.0 Date: 2025-11-29 Status: Canonical

This advisory defines the product rationale, ingestion semantics, and implementation strategy for the Concelier module, covering the Link-Not-Merge model, connector pipelines, observation storage, and deterministic exports.


1. Executive Summary

Concelier is the advisory ingestion engine that acquires, normalizes, and correlates vulnerability advisories from authoritative sources. Key capabilities:

  • Aggregation-Only Contract - No derived semantics in ingestion
  • Link-Not-Merge - Observations correlated, never merged
  • Multi-Source Connectors - Vendor PSIRTs, distros, OSS ecosystems
  • Deterministic Exports - Reproducible JSON, Trivy DB bundles
  • Conflict Detection - Structured payloads for divergent claims

2. Market Drivers

2.1 Target Segments

Segment Ingestion Requirements Use Case
Security Teams Authoritative data Accurate vulnerability assessment
Compliance Provenance tracking Audit trail for advisory sources
DevSecOps Fast updates CI/CD pipeline integration
Air-Gap Ops Offline bundles Disconnected environment support

2.2 Competitive Positioning

Most vulnerability databases merge data, losing provenance. Stella Ops differentiates with:

  • Link-Not-Merge preserving all source claims
  • Conflict visibility showing where sources disagree
  • Deterministic exports enabling reproducible builds
  • Multi-format support (CSAF, OSV, GHSA, vendor-specific)
  • Signature verification for upstream integrity

3. Aggregation-Only Contract (AOC)

3.1 Core Principles

The AOC ensures ingestion purity:

  1. No derived semantics - No severity consensus, merged status, or fix hints
  2. Immutable raw docs - Append-only with version chains
  3. Mandatory provenance - Source, timestamp, signature status
  4. Linkset only - Joins stored separately, never mutate content
  5. Deterministic canonicalization - Stable JSON output
  6. Idempotent upserts - Same hash = no new record
  7. CI verification - AOCVerifier enforces at runtime

3.2 Enforcement

// AOCWriteGuard checks before every write
public class AOCWriteGuard
{
    Task GuardAsync(AdvisoryObservation obs)
    {
        // Verify no forbidden properties
        // Validate provenance completeness
        // Check tenant claims
        // Normalize timestamps
        // Compute content hash
    }
}

Roslyn analyzers (StellaOps.AOC.Analyzers) scan connectors at build time to prevent forbidden property usage.


4. Advisory Observation Model

4.1 Observation Structure

{
  "_id": "tenant:vendor:upstreamId:revision",
  "tenant": "acme-corp",
  "source": {
    "vendor": "OSV",
    "stream": "github",
    "api": "https://api.osv.dev/v1/.../GHSA-...",
    "collectorVersion": "concelier/1.7.3"
  },
  "upstream": {
    "upstreamId": "GHSA-xxxx-....",
    "documentVersion": "2025-09-01T12:13:14Z",
    "fetchedAt": "2025-09-01T13:04:05Z",
    "receivedAt": "2025-09-01T13:04:06Z",
    "contentHash": "sha256:...",
    "signature": {
      "present": true,
      "format": "dsse",
      "keyId": "rekor:.../key/abc"
    }
  },
  "content": {
    "format": "OSV",
    "specVersion": "1.6",
    "raw": { /* unmodified upstream document */ }
  },
  "identifiers": {
    "primary": "GHSA-xxxx-....",
    "aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
  },
  "linkset": {
    "purls": ["pkg:npm/lodash@4.17.21"],
    "cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
    "references": [
      {"type": "advisory", "url": "https://..."},
      {"type": "fix", "url": "https://..."}
    ]
  },
  "supersedes": "tenant:vendor:upstreamId:prev-revision",
  "createdAt": "2025-09-01T13:04:06Z"
}

4.2 Linkset Correlation

{
  "_id": "sha256:...",
  "tenant": "acme-corp",
  "key": {
    "vulnerabilityId": "CVE-2025-12345",
    "productKey": "pkg:npm/lodash@4.17.21",
    "confidence": "high"
  },
  "observations": [
    {
      "observationId": "tenant:osv:GHSA-...:v1",
      "sourceVendor": "OSV",
      "statement": { "severity": "high" },
      "collectedAt": "2025-09-01T13:04:06Z"
    },
    {
      "observationId": "tenant:nvd:CVE-2025-12345:v2",
      "sourceVendor": "NVD",
      "statement": { "severity": "critical" },
      "collectedAt": "2025-09-01T14:00:00Z"
    }
  ],
  "conflicts": [
    {
      "conflictId": "sha256:...",
      "type": "severity-mismatch",
      "observations": [
        { "source": "OSV", "value": "high" },
        { "source": "NVD", "value": "critical" }
      ],
      "confidence": "medium",
      "detectedAt": "2025-09-01T14:00:01Z"
    }
  ]
}

5. Source Connectors

5.1 Source Families

Family Examples Format
Vendor PSIRTs Microsoft, Oracle, Cisco, Adobe CSAF, proprietary
Linux Distros Red Hat, SUSE, Ubuntu, Debian, Alpine CSAF, JSON, XML
OSS Ecosystems OSV, GHSA, npm, PyPI, Maven OSV, GraphQL
CERTs CISA (KEV), JVN, CERT-FR JSON, XML

5.2 Connector Contract

public interface IFeedConnector
{
    string SourceName { get; }

    // Fetch signed feeds or offline mirrors
    Task FetchAsync(IServiceProvider sp, CancellationToken ct);

    // Normalize to strongly-typed DTOs
    Task ParseAsync(IServiceProvider sp, CancellationToken ct);

    // Build canonical records with provenance
    Task MapAsync(IServiceProvider sp, CancellationToken ct);
}

5.3 Connector Lifecycle

  1. Snapshot - Fetch with cursor, ETag, rate limiting
  2. Parse - Schema validation, normalization
  3. Guard - AOCWriteGuard enforcement
  4. Write - Append-only insert
  5. Event - Emit advisory.observation.updated

6. Version Semantics

6.1 Ecosystem Normalization

Ecosystem Format Normalization
npm, PyPI, Maven SemVer Intervals with <, >=, ~, ^
RPM EVR epoch:version-release with order keys
DEB dpkg Version comparison with order keys
APK Alpine Computed order keys

6.2 CVSS Handling

  • Normalize CVSS v2/v3/v4 where available
  • Track all source CVSS values
  • Effective severity = max (configurable)
  • Store KEV evidence with source and date

7. Conflict Detection

7.1 Conflict Types

Type Description Resolution
severity-mismatch Different severity ratings Policy decides
affected-range-divergence Different version ranges Most specific wins
reference-clash Contradictory references Surface all
alias-inconsistency Different alias mappings Union with provenance
metadata-gap Missing information Flag for review

7.2 Conflict Visibility

Conflicts are never hidden - they are:

  • Stored in linkset documents
  • Surfaced in API responses
  • Included in exports
  • Displayed in Console UI

8. Deterministic Exports

8.1 JSON Export

exports/json/
├── CVE/
│   ├── 20/
│   │   └── CVE-2025-12345.json
│   └── ...
├── manifest.json
└── export-digest.sha256
  • Deterministic folder structure
  • Canonical JSON (sorted keys, stable timestamps)
  • Manifest with SHA-256 per file
  • Reproducible across runs

8.2 Trivy DB Export

exports/trivy/
├── db.tar.gz
├── metadata.json
└── manifest.json
  • Bolt DB compatible with Trivy
  • Full and delta modes
  • ORAS push to registries
  • Mirror manifests for domains

8.3 Export Determinism

Running the same export against the same data must produce:

  • Identical file contents
  • Identical manifest hashes
  • Identical export digests

9. Implementation Strategy

9.1 Phase 1: Core Pipeline (Complete)

  • AOCWriteGuard implementation
  • Observation storage
  • Basic connectors (Red Hat, SUSE, OSV)
  • JSON export
  • Linkset correlation engine
  • Conflict detection
  • Event emission
  • API surface

9.3 Phase 3: Expanded Sources (In Progress)

  • GHSA GraphQL connector
  • Debian DSA connector
  • Alpine secdb connector (CONCELIER-CONN-50-001)
  • CISA KEV enrichment (CONCELIER-KEV-51-001)

9.4 Phase 4: Export Enhancements (Planned)

  • Delta Trivy DB exports
  • ORAS registry push
  • Attestation hand-off
  • Mirror bundle signing

10. API Surface

10.1 Sources & Jobs

Endpoint Method Scope Description
/api/v1/concelier/sources GET concelier.read List sources
/api/v1/concelier/sources/{name}/trigger POST concelier.admin Trigger fetch
/api/v1/concelier/sources/{name}/pause POST concelier.admin Pause source
/api/v1/concelier/jobs/{id} GET concelier.read Job status

10.2 Exports

Endpoint Method Scope Description
/api/v1/concelier/exports/json POST concelier.export Trigger JSON export
/api/v1/concelier/exports/trivy POST concelier.export Trigger Trivy export
/api/v1/concelier/exports/{id} GET concelier.read Export status
Endpoint Method Scope Description
/api/v1/concelier/advisories/{key} GET concelier.read Get advisory
/api/v1/concelier/observations/{id} GET concelier.read Get observation
/api/v1/concelier/linksets GET concelier.read Query linksets

11. Storage Model

11.1 Collections

Collection Purpose Key Indexes
sources Connector catalog {_id}
source_state Run state {sourceName}
documents Raw payloads {sourceName, uri}
advisory_observations Normalized records {tenant, upstream.upstreamId}
advisory_linksets Correlations {tenant, key.vulnerabilityId, key.productKey}
advisory_events Change log {type, occurredAt}
export_state Export cursors {exportKind}

11.2 GridFS Buckets

  • fs.documents - Raw payloads (immutable)
  • fs.exports - Historical archives

12. Event Model

12.1 Events

Event Trigger Content
advisory.observation.updated@1 New/superseded observation IDs, hash, supersedes
advisory.linkset.updated@1 Correlation change Deltas, conflicts

12.2 Event Transport

  • Primary: NATS
  • Fallback: Redis Stream
  • Offline Kit captures for replay

13. Observability

13.1 Metrics

  • concelier.fetch.docs_total{source}
  • concelier.fetch.bytes_total{source}
  • concelier.parse.failures_total{source}
  • concelier.observations.write_total{result}
  • concelier.linksets.updated_total{result}
  • concelier.linksets.conflicts_total{type}
  • concelier.export.duration_seconds{kind}

13.2 Performance Targets

Operation Target
Ingest throughput 5k docs/min
Observation write < 5ms p95
Linkset build < 15ms p95
Export (1M advisories) < 90 seconds

14. Security Considerations

14.1 Outbound Security

  • Allowlist per connector (domains, protocols)
  • Proxy support with TLS pinning
  • Rate limiting per source

14.2 Signature Verification

  • PGP/cosign/x509 verification stored
  • Failed verification flagged, not rejected
  • Policy can down-weight unsigned sources

14.3 Determinism

  • Canonical JSON writer
  • Stable export digests
  • Reproducible across runs

Resource Location
Concelier architecture docs/modules/concelier/architecture.md
Link-Not-Merge schema docs/modules/concelier/link-not-merge-schema.md
Event schemas docs/modules/concelier/events/
Attestation guide docs/modules/concelier/attestation.md

16. Sprint Mapping

  • Primary Sprint: SPRINT_0115_0001_0004_concelier_iv.md
  • Related Sprints:
    • SPRINT_0113_0001_0002_concelier_ii.md
    • SPRINT_0114_0001_0003_concelier_iii.md

Key Task IDs:

  • CONCELIER-AOC-40-001 - AOC enforcement (DONE)
  • CONCELIER-LNM-41-001 - Link-Not-Merge (DONE)
  • CONCELIER-CONN-50-001 - Alpine connector (IN PROGRESS)
  • CONCELIER-KEV-51-001 - KEV enrichment (TODO)
  • CONCELIER-EXPORT-55-001 - Delta exports (TODO)

17. Success Metrics

Metric Target
Advisory freshness < 1 hour from source
Ingestion accuracy 100% provenance retention
Export determinism 100% hash reproducibility
Conflict detection 100% of source divergence
Source coverage 20+ authoritative sources

Last updated: 2025-11-29