true the date
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-30 19:23:21 +02:00
parent 71e9a56cfd
commit 0bef705bcc
14 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,476 @@
# Concelier Advisory Ingestion Model
**Version:** 1.0
**Date:** 2025-11-29
**Status:** Canonical
This advisory defines the product rationale, ingestion semantics, and implementation strategy for the Concelier module, covering the Link-Not-Merge model, connector pipelines, observation storage, and deterministic exports.
---
## 1. Executive Summary
Concelier is the **advisory ingestion engine** that acquires, normalizes, and correlates vulnerability advisories from authoritative sources. Key capabilities:
- **Aggregation-Only Contract** - No derived semantics in ingestion
- **Link-Not-Merge** - Observations correlated, never merged
- **Multi-Source Connectors** - Vendor PSIRTs, distros, OSS ecosystems
- **Deterministic Exports** - Reproducible JSON, Trivy DB bundles
- **Conflict Detection** - Structured payloads for divergent claims
---
## 2. Market Drivers
### 2.1 Target Segments
| Segment | Ingestion Requirements | Use Case |
|---------|------------------------|----------|
| **Security Teams** | Authoritative data | Accurate vulnerability assessment |
| **Compliance** | Provenance tracking | Audit trail for advisory sources |
| **DevSecOps** | Fast updates | CI/CD pipeline integration |
| **Air-Gap Ops** | Offline bundles | Disconnected environment support |
### 2.2 Competitive Positioning
Most vulnerability databases merge data, losing provenance. Stella Ops differentiates with:
- **Link-Not-Merge** preserving all source claims
- **Conflict visibility** showing where sources disagree
- **Deterministic exports** enabling reproducible builds
- **Multi-format support** (CSAF, OSV, GHSA, vendor-specific)
- **Signature verification** for upstream integrity
---
## 3. Aggregation-Only Contract (AOC)
### 3.1 Core Principles
The AOC ensures ingestion purity:
1. **No derived semantics** - No severity consensus, merged status, or fix hints
2. **Immutable raw docs** - Append-only with version chains
3. **Mandatory provenance** - Source, timestamp, signature status
4. **Linkset only** - Joins stored separately, never mutate content
5. **Deterministic canonicalization** - Stable JSON output
6. **Idempotent upserts** - Same hash = no new record
7. **CI verification** - AOCVerifier enforces at runtime
### 3.2 Enforcement
```csharp
// AOCWriteGuard checks before every write
public class AOCWriteGuard
{
Task GuardAsync(AdvisoryObservation obs)
{
// Verify no forbidden properties
// Validate provenance completeness
// Check tenant claims
// Normalize timestamps
// Compute content hash
}
}
```
Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors at build time to prevent forbidden property usage.
---
## 4. Advisory Observation Model
### 4.1 Observation Structure
```json
{
"_id": "tenant:vendor:upstreamId:revision",
"tenant": "acme-corp",
"source": {
"vendor": "OSV",
"stream": "github",
"api": "https://api.osv.dev/v1/.../GHSA-...",
"collectorVersion": "concelier/1.7.3"
},
"upstream": {
"upstreamId": "GHSA-xxxx-....",
"documentVersion": "2025-09-01T12:13:14Z",
"fetchedAt": "2025-09-01T13:04:05Z",
"receivedAt": "2025-09-01T13:04:06Z",
"contentHash": "sha256:...",
"signature": {
"present": true,
"format": "dsse",
"keyId": "rekor:.../key/abc"
}
},
"content": {
"format": "OSV",
"specVersion": "1.6",
"raw": { /* unmodified upstream document */ }
},
"identifiers": {
"primary": "GHSA-xxxx-....",
"aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
},
"linkset": {
"purls": ["pkg:npm/lodash@4.17.21"],
"cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
"references": [
{"type": "advisory", "url": "https://..."},
{"type": "fix", "url": "https://..."}
]
},
"supersedes": "tenant:vendor:upstreamId:prev-revision",
"createdAt": "2025-09-01T13:04:06Z"
}
```
### 4.2 Linkset Correlation
```json
{
"_id": "sha256:...",
"tenant": "acme-corp",
"key": {
"vulnerabilityId": "CVE-2025-12345",
"productKey": "pkg:npm/lodash@4.17.21",
"confidence": "high"
},
"observations": [
{
"observationId": "tenant:osv:GHSA-...:v1",
"sourceVendor": "OSV",
"statement": { "severity": "high" },
"collectedAt": "2025-09-01T13:04:06Z"
},
{
"observationId": "tenant:nvd:CVE-2025-12345:v2",
"sourceVendor": "NVD",
"statement": { "severity": "critical" },
"collectedAt": "2025-09-01T14:00:00Z"
}
],
"conflicts": [
{
"conflictId": "sha256:...",
"type": "severity-mismatch",
"observations": [
{ "source": "OSV", "value": "high" },
{ "source": "NVD", "value": "critical" }
],
"confidence": "medium",
"detectedAt": "2025-09-01T14:00:01Z"
}
]
}
```
---
## 5. Source Connectors
### 5.1 Source Families
| Family | Examples | Format |
|--------|----------|--------|
| **Vendor PSIRTs** | Microsoft, Oracle, Cisco, Adobe | CSAF, proprietary |
| **Linux Distros** | Red Hat, SUSE, Ubuntu, Debian, Alpine | CSAF, JSON, XML |
| **OSS Ecosystems** | OSV, GHSA, npm, PyPI, Maven | OSV, GraphQL |
| **CERTs** | CISA (KEV), JVN, CERT-FR | JSON, XML |
### 5.2 Connector Contract
```csharp
public interface IFeedConnector
{
string SourceName { get; }
// Fetch signed feeds or offline mirrors
Task FetchAsync(IServiceProvider sp, CancellationToken ct);
// Normalize to strongly-typed DTOs
Task ParseAsync(IServiceProvider sp, CancellationToken ct);
// Build canonical records with provenance
Task MapAsync(IServiceProvider sp, CancellationToken ct);
}
```
### 5.3 Connector Lifecycle
1. **Snapshot** - Fetch with cursor, ETag, rate limiting
2. **Parse** - Schema validation, normalization
3. **Guard** - AOCWriteGuard enforcement
4. **Write** - Append-only insert
5. **Event** - Emit `advisory.observation.updated`
---
## 6. Version Semantics
### 6.1 Ecosystem Normalization
| Ecosystem | Format | Normalization |
|-----------|--------|---------------|
| npm, PyPI, Maven | SemVer | Intervals with `<`, `>=`, `~`, `^` |
| RPM | EVR | `epoch:version-release` with order keys |
| DEB | dpkg | Version comparison with order keys |
| APK | Alpine | Computed order keys |
### 6.2 CVSS Handling
- Normalize CVSS v2/v3/v4 where available
- Track all source CVSS values
- Effective severity = max (configurable)
- Store KEV evidence with source and date
---
## 7. Conflict Detection
### 7.1 Conflict Types
| Type | Description | Resolution |
|------|-------------|------------|
| `severity-mismatch` | Different severity ratings | Policy decides |
| `affected-range-divergence` | Different version ranges | Most specific wins |
| `reference-clash` | Contradictory references | Surface all |
| `alias-inconsistency` | Different alias mappings | Union with provenance |
| `metadata-gap` | Missing information | Flag for review |
### 7.2 Conflict Visibility
Conflicts are never hidden - they are:
- Stored in linkset documents
- Surfaced in API responses
- Included in exports
- Displayed in Console UI
---
## 8. Deterministic Exports
### 8.1 JSON Export
```
exports/json/
├── CVE/
│ ├── 20/
│ │ └── CVE-2025-12345.json
│ └── ...
├── manifest.json
└── export-digest.sha256
```
- Deterministic folder structure
- Canonical JSON (sorted keys, stable timestamps)
- Manifest with SHA-256 per file
- Reproducible across runs
### 8.2 Trivy DB Export
```
exports/trivy/
├── db.tar.gz
├── metadata.json
└── manifest.json
```
- Bolt DB compatible with Trivy
- Full and delta modes
- ORAS push to registries
- Mirror manifests for domains
### 8.3 Export Determinism
Running the same export against the same data must produce:
- Identical file contents
- Identical manifest hashes
- Identical export digests
---
## 9. Implementation Strategy
### 9.1 Phase 1: Core Pipeline (Complete)
- [x] AOCWriteGuard implementation
- [x] Observation storage
- [x] Basic connectors (Red Hat, SUSE, OSV)
- [x] JSON export
### 9.2 Phase 2: Link-Not-Merge (Complete)
- [x] Linkset correlation engine
- [x] Conflict detection
- [x] Event emission
- [x] API surface
### 9.3 Phase 3: Expanded Sources (In Progress)
- [x] GHSA GraphQL connector
- [x] Debian DSA connector
- [ ] Alpine secdb connector (CONCELIER-CONN-50-001)
- [ ] CISA KEV enrichment (CONCELIER-KEV-51-001)
### 9.4 Phase 4: Export Enhancements (Planned)
- [ ] Delta Trivy DB exports
- [ ] ORAS registry push
- [ ] Attestation hand-off
- [ ] Mirror bundle signing
---
## 10. API Surface
### 10.1 Sources & Jobs
| Endpoint | Method | Scope | Description |
|----------|--------|-------|-------------|
| `/api/v1/concelier/sources` | GET | `concelier.read` | List sources |
| `/api/v1/concelier/sources/{name}/trigger` | POST | `concelier.admin` | Trigger fetch |
| `/api/v1/concelier/sources/{name}/pause` | POST | `concelier.admin` | Pause source |
| `/api/v1/concelier/jobs/{id}` | GET | `concelier.read` | Job status |
### 10.2 Exports
| Endpoint | Method | Scope | Description |
|----------|--------|-------|-------------|
| `/api/v1/concelier/exports/json` | POST | `concelier.export` | Trigger JSON export |
| `/api/v1/concelier/exports/trivy` | POST | `concelier.export` | Trigger Trivy export |
| `/api/v1/concelier/exports/{id}` | GET | `concelier.read` | Export status |
### 10.3 Search
| Endpoint | Method | Scope | Description |
|----------|--------|-------|-------------|
| `/api/v1/concelier/advisories/{key}` | GET | `concelier.read` | Get advisory |
| `/api/v1/concelier/observations/{id}` | GET | `concelier.read` | Get observation |
| `/api/v1/concelier/linksets` | GET | `concelier.read` | Query linksets |
---
## 11. Storage Model
### 11.1 Collections
| Collection | Purpose | Key Indexes |
|------------|---------|-------------|
| `sources` | Connector catalog | `{_id}` |
| `source_state` | Run state | `{sourceName}` |
| `documents` | Raw payloads | `{sourceName, uri}` |
| `advisory_observations` | Normalized records | `{tenant, upstream.upstreamId}` |
| `advisory_linksets` | Correlations | `{tenant, key.vulnerabilityId, key.productKey}` |
| `advisory_events` | Change log | `{type, occurredAt}` |
| `export_state` | Export cursors | `{exportKind}` |
### 11.2 GridFS Buckets
- `fs.documents` - Raw payloads (immutable)
- `fs.exports` - Historical archives
---
## 12. Event Model
### 12.1 Events
| Event | Trigger | Content |
|-------|---------|---------|
| `advisory.observation.updated@1` | New/superseded observation | IDs, hash, supersedes |
| `advisory.linkset.updated@1` | Correlation change | Deltas, conflicts |
### 12.2 Event Transport
- Primary: NATS
- Fallback: Redis Stream
- Offline Kit captures for replay
---
## 13. Observability
### 13.1 Metrics
- `concelier.fetch.docs_total{source}`
- `concelier.fetch.bytes_total{source}`
- `concelier.parse.failures_total{source}`
- `concelier.observations.write_total{result}`
- `concelier.linksets.updated_total{result}`
- `concelier.linksets.conflicts_total{type}`
- `concelier.export.duration_seconds{kind}`
### 13.2 Performance Targets
| Operation | Target |
|-----------|--------|
| Ingest throughput | 5k docs/min |
| Observation write | < 5ms p95 |
| Linkset build | < 15ms p95 |
| Export (1M advisories) | < 90 seconds |
---
## 14. Security Considerations
### 14.1 Outbound Security
- Allowlist per connector (domains, protocols)
- Proxy support with TLS pinning
- Rate limiting per source
### 14.2 Signature Verification
- PGP/cosign/x509 verification stored
- Failed verification flagged, not rejected
- Policy can down-weight unsigned sources
### 14.3 Determinism
- Canonical JSON writer
- Stable export digests
- Reproducible across runs
---
## 15. Related Documentation
| Resource | Location |
|----------|----------|
| Concelier architecture | `docs/modules/concelier/architecture.md` |
| Link-Not-Merge schema | `docs/modules/concelier/link-not-merge-schema.md` |
| Event schemas | `docs/modules/concelier/events/` |
| Attestation guide | `docs/modules/concelier/attestation.md` |
---
## 16. Sprint Mapping
- **Primary Sprint:** SPRINT_0115_0001_0004_concelier_iv.md
- **Related Sprints:**
- SPRINT_0113_0001_0002_concelier_ii.md
- SPRINT_0114_0001_0003_concelier_iii.md
**Key Task IDs:**
- `CONCELIER-AOC-40-001` - AOC enforcement (DONE)
- `CONCELIER-LNM-41-001` - Link-Not-Merge (DONE)
- `CONCELIER-CONN-50-001` - Alpine connector (IN PROGRESS)
- `CONCELIER-KEV-51-001` - KEV enrichment (TODO)
- `CONCELIER-EXPORT-55-001` - Delta exports (TODO)
---
## 17. Success Metrics
| Metric | Target |
|--------|--------|
| Advisory freshness | < 1 hour from source |
| Ingestion accuracy | 100% provenance retention |
| Export determinism | 100% hash reproducibility |
| Conflict detection | 100% of source divergence |
| Source coverage | 20+ authoritative sources |
---
*Last updated: 2025-11-29*