This commit is contained in:
@@ -0,0 +1,476 @@
|
||||
# Concelier Advisory Ingestion Model
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, ingestion semantics, and implementation strategy for the Concelier module, covering the Link-Not-Merge model, connector pipelines, observation storage, and deterministic exports.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Concelier is the **advisory ingestion engine** that acquires, normalizes, and correlates vulnerability advisories from authoritative sources. Key capabilities:
|
||||
|
||||
- **Aggregation-Only Contract** - No derived semantics in ingestion
|
||||
- **Link-Not-Merge** - Observations correlated, never merged
|
||||
- **Multi-Source Connectors** - Vendor PSIRTs, distros, OSS ecosystems
|
||||
- **Deterministic Exports** - Reproducible JSON, Trivy DB bundles
|
||||
- **Conflict Detection** - Structured payloads for divergent claims
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Ingestion Requirements | Use Case |
|
||||
|---------|------------------------|----------|
|
||||
| **Security Teams** | Authoritative data | Accurate vulnerability assessment |
|
||||
| **Compliance** | Provenance tracking | Audit trail for advisory sources |
|
||||
| **DevSecOps** | Fast updates | CI/CD pipeline integration |
|
||||
| **Air-Gap Ops** | Offline bundles | Disconnected environment support |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability databases merge data, losing provenance. Stella Ops differentiates with:
|
||||
- **Link-Not-Merge** preserving all source claims
|
||||
- **Conflict visibility** showing where sources disagree
|
||||
- **Deterministic exports** enabling reproducible builds
|
||||
- **Multi-format support** (CSAF, OSV, GHSA, vendor-specific)
|
||||
- **Signature verification** for upstream integrity
|
||||
|
||||
---
|
||||
|
||||
## 3. Aggregation-Only Contract (AOC)
|
||||
|
||||
### 3.1 Core Principles
|
||||
|
||||
The AOC ensures ingestion purity:
|
||||
|
||||
1. **No derived semantics** - No severity consensus, merged status, or fix hints
|
||||
2. **Immutable raw docs** - Append-only with version chains
|
||||
3. **Mandatory provenance** - Source, timestamp, signature status
|
||||
4. **Linkset only** - Joins stored separately, never mutate content
|
||||
5. **Deterministic canonicalization** - Stable JSON output
|
||||
6. **Idempotent upserts** - Same hash = no new record
|
||||
7. **CI verification** - AOCVerifier enforces at runtime
|
||||
|
||||
### 3.2 Enforcement
|
||||
|
||||
```csharp
|
||||
// AOCWriteGuard checks before every write
|
||||
public class AOCWriteGuard
|
||||
{
|
||||
Task GuardAsync(AdvisoryObservation obs)
|
||||
{
|
||||
// Verify no forbidden properties
|
||||
// Validate provenance completeness
|
||||
// Check tenant claims
|
||||
// Normalize timestamps
|
||||
// Compute content hash
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors at build time to prevent forbidden property usage.
|
||||
|
||||
---
|
||||
|
||||
## 4. Advisory Observation Model
|
||||
|
||||
### 4.1 Observation Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "tenant:vendor:upstreamId:revision",
|
||||
"tenant": "acme-corp",
|
||||
"source": {
|
||||
"vendor": "OSV",
|
||||
"stream": "github",
|
||||
"api": "https://api.osv.dev/v1/.../GHSA-...",
|
||||
"collectorVersion": "concelier/1.7.3"
|
||||
},
|
||||
"upstream": {
|
||||
"upstreamId": "GHSA-xxxx-....",
|
||||
"documentVersion": "2025-09-01T12:13:14Z",
|
||||
"fetchedAt": "2025-09-01T13:04:05Z",
|
||||
"receivedAt": "2025-09-01T13:04:06Z",
|
||||
"contentHash": "sha256:...",
|
||||
"signature": {
|
||||
"present": true,
|
||||
"format": "dsse",
|
||||
"keyId": "rekor:.../key/abc"
|
||||
}
|
||||
},
|
||||
"content": {
|
||||
"format": "OSV",
|
||||
"specVersion": "1.6",
|
||||
"raw": { /* unmodified upstream document */ }
|
||||
},
|
||||
"identifiers": {
|
||||
"primary": "GHSA-xxxx-....",
|
||||
"aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
|
||||
},
|
||||
"linkset": {
|
||||
"purls": ["pkg:npm/lodash@4.17.21"],
|
||||
"cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
|
||||
"references": [
|
||||
{"type": "advisory", "url": "https://..."},
|
||||
{"type": "fix", "url": "https://..."}
|
||||
]
|
||||
},
|
||||
"supersedes": "tenant:vendor:upstreamId:prev-revision",
|
||||
"createdAt": "2025-09-01T13:04:06Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Linkset Correlation
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "sha256:...",
|
||||
"tenant": "acme-corp",
|
||||
"key": {
|
||||
"vulnerabilityId": "CVE-2025-12345",
|
||||
"productKey": "pkg:npm/lodash@4.17.21",
|
||||
"confidence": "high"
|
||||
},
|
||||
"observations": [
|
||||
{
|
||||
"observationId": "tenant:osv:GHSA-...:v1",
|
||||
"sourceVendor": "OSV",
|
||||
"statement": { "severity": "high" },
|
||||
"collectedAt": "2025-09-01T13:04:06Z"
|
||||
},
|
||||
{
|
||||
"observationId": "tenant:nvd:CVE-2025-12345:v2",
|
||||
"sourceVendor": "NVD",
|
||||
"statement": { "severity": "critical" },
|
||||
"collectedAt": "2025-09-01T14:00:00Z"
|
||||
}
|
||||
],
|
||||
"conflicts": [
|
||||
{
|
||||
"conflictId": "sha256:...",
|
||||
"type": "severity-mismatch",
|
||||
"observations": [
|
||||
{ "source": "OSV", "value": "high" },
|
||||
{ "source": "NVD", "value": "critical" }
|
||||
],
|
||||
"confidence": "medium",
|
||||
"detectedAt": "2025-09-01T14:00:01Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Source Connectors
|
||||
|
||||
### 5.1 Source Families
|
||||
|
||||
| Family | Examples | Format |
|
||||
|--------|----------|--------|
|
||||
| **Vendor PSIRTs** | Microsoft, Oracle, Cisco, Adobe | CSAF, proprietary |
|
||||
| **Linux Distros** | Red Hat, SUSE, Ubuntu, Debian, Alpine | CSAF, JSON, XML |
|
||||
| **OSS Ecosystems** | OSV, GHSA, npm, PyPI, Maven | OSV, GraphQL |
|
||||
| **CERTs** | CISA (KEV), JVN, CERT-FR | JSON, XML |
|
||||
|
||||
### 5.2 Connector Contract
|
||||
|
||||
```csharp
|
||||
public interface IFeedConnector
|
||||
{
|
||||
string SourceName { get; }
|
||||
|
||||
// Fetch signed feeds or offline mirrors
|
||||
Task FetchAsync(IServiceProvider sp, CancellationToken ct);
|
||||
|
||||
// Normalize to strongly-typed DTOs
|
||||
Task ParseAsync(IServiceProvider sp, CancellationToken ct);
|
||||
|
||||
// Build canonical records with provenance
|
||||
Task MapAsync(IServiceProvider sp, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Connector Lifecycle
|
||||
|
||||
1. **Snapshot** - Fetch with cursor, ETag, rate limiting
|
||||
2. **Parse** - Schema validation, normalization
|
||||
3. **Guard** - AOCWriteGuard enforcement
|
||||
4. **Write** - Append-only insert
|
||||
5. **Event** - Emit `advisory.observation.updated`
|
||||
|
||||
---
|
||||
|
||||
## 6. Version Semantics
|
||||
|
||||
### 6.1 Ecosystem Normalization
|
||||
|
||||
| Ecosystem | Format | Normalization |
|
||||
|-----------|--------|---------------|
|
||||
| npm, PyPI, Maven | SemVer | Intervals with `<`, `>=`, `~`, `^` |
|
||||
| RPM | EVR | `epoch:version-release` with order keys |
|
||||
| DEB | dpkg | Version comparison with order keys |
|
||||
| APK | Alpine | Computed order keys |
|
||||
|
||||
### 6.2 CVSS Handling
|
||||
|
||||
- Normalize CVSS v2/v3/v4 where available
|
||||
- Track all source CVSS values
|
||||
- Effective severity = max (configurable)
|
||||
- Store KEV evidence with source and date
|
||||
|
||||
---
|
||||
|
||||
## 7. Conflict Detection
|
||||
|
||||
### 7.1 Conflict Types
|
||||
|
||||
| Type | Description | Resolution |
|
||||
|------|-------------|------------|
|
||||
| `severity-mismatch` | Different severity ratings | Policy decides |
|
||||
| `affected-range-divergence` | Different version ranges | Most specific wins |
|
||||
| `reference-clash` | Contradictory references | Surface all |
|
||||
| `alias-inconsistency` | Different alias mappings | Union with provenance |
|
||||
| `metadata-gap` | Missing information | Flag for review |
|
||||
|
||||
### 7.2 Conflict Visibility
|
||||
|
||||
Conflicts are never hidden - they are:
|
||||
- Stored in linkset documents
|
||||
- Surfaced in API responses
|
||||
- Included in exports
|
||||
- Displayed in Console UI
|
||||
|
||||
---
|
||||
|
||||
## 8. Deterministic Exports
|
||||
|
||||
### 8.1 JSON Export
|
||||
|
||||
```
|
||||
exports/json/
|
||||
├── CVE/
|
||||
│ ├── 20/
|
||||
│ │ └── CVE-2025-12345.json
|
||||
│ └── ...
|
||||
├── manifest.json
|
||||
└── export-digest.sha256
|
||||
```
|
||||
|
||||
- Deterministic folder structure
|
||||
- Canonical JSON (sorted keys, stable timestamps)
|
||||
- Manifest with SHA-256 per file
|
||||
- Reproducible across runs
|
||||
|
||||
### 8.2 Trivy DB Export
|
||||
|
||||
```
|
||||
exports/trivy/
|
||||
├── db.tar.gz
|
||||
├── metadata.json
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
- Bolt DB compatible with Trivy
|
||||
- Full and delta modes
|
||||
- ORAS push to registries
|
||||
- Mirror manifests for domains
|
||||
|
||||
### 8.3 Export Determinism
|
||||
|
||||
Running the same export against the same data must produce:
|
||||
- Identical file contents
|
||||
- Identical manifest hashes
|
||||
- Identical export digests
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Strategy
|
||||
|
||||
### 9.1 Phase 1: Core Pipeline (Complete)
|
||||
|
||||
- [x] AOCWriteGuard implementation
|
||||
- [x] Observation storage
|
||||
- [x] Basic connectors (Red Hat, SUSE, OSV)
|
||||
- [x] JSON export
|
||||
|
||||
### 9.2 Phase 2: Link-Not-Merge (Complete)
|
||||
|
||||
- [x] Linkset correlation engine
|
||||
- [x] Conflict detection
|
||||
- [x] Event emission
|
||||
- [x] API surface
|
||||
|
||||
### 9.3 Phase 3: Expanded Sources (In Progress)
|
||||
|
||||
- [x] GHSA GraphQL connector
|
||||
- [x] Debian DSA connector
|
||||
- [ ] Alpine secdb connector (CONCELIER-CONN-50-001)
|
||||
- [ ] CISA KEV enrichment (CONCELIER-KEV-51-001)
|
||||
|
||||
### 9.4 Phase 4: Export Enhancements (Planned)
|
||||
|
||||
- [ ] Delta Trivy DB exports
|
||||
- [ ] ORAS registry push
|
||||
- [ ] Attestation hand-off
|
||||
- [ ] Mirror bundle signing
|
||||
|
||||
---
|
||||
|
||||
## 10. API Surface
|
||||
|
||||
### 10.1 Sources & Jobs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/sources` | GET | `concelier.read` | List sources |
|
||||
| `/api/v1/concelier/sources/{name}/trigger` | POST | `concelier.admin` | Trigger fetch |
|
||||
| `/api/v1/concelier/sources/{name}/pause` | POST | `concelier.admin` | Pause source |
|
||||
| `/api/v1/concelier/jobs/{id}` | GET | `concelier.read` | Job status |
|
||||
|
||||
### 10.2 Exports
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/exports/json` | POST | `concelier.export` | Trigger JSON export |
|
||||
| `/api/v1/concelier/exports/trivy` | POST | `concelier.export` | Trigger Trivy export |
|
||||
| `/api/v1/concelier/exports/{id}` | GET | `concelier.read` | Export status |
|
||||
|
||||
### 10.3 Search
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/advisories/{key}` | GET | `concelier.read` | Get advisory |
|
||||
| `/api/v1/concelier/observations/{id}` | GET | `concelier.read` | Get observation |
|
||||
| `/api/v1/concelier/linksets` | GET | `concelier.read` | Query linksets |
|
||||
|
||||
---
|
||||
|
||||
## 11. Storage Model
|
||||
|
||||
### 11.1 Collections
|
||||
|
||||
| Collection | Purpose | Key Indexes |
|
||||
|------------|---------|-------------|
|
||||
| `sources` | Connector catalog | `{_id}` |
|
||||
| `source_state` | Run state | `{sourceName}` |
|
||||
| `documents` | Raw payloads | `{sourceName, uri}` |
|
||||
| `advisory_observations` | Normalized records | `{tenant, upstream.upstreamId}` |
|
||||
| `advisory_linksets` | Correlations | `{tenant, key.vulnerabilityId, key.productKey}` |
|
||||
| `advisory_events` | Change log | `{type, occurredAt}` |
|
||||
| `export_state` | Export cursors | `{exportKind}` |
|
||||
|
||||
### 11.2 GridFS Buckets
|
||||
|
||||
- `fs.documents` - Raw payloads (immutable)
|
||||
- `fs.exports` - Historical archives
|
||||
|
||||
---
|
||||
|
||||
## 12. Event Model
|
||||
|
||||
### 12.1 Events
|
||||
|
||||
| Event | Trigger | Content |
|
||||
|-------|---------|---------|
|
||||
| `advisory.observation.updated@1` | New/superseded observation | IDs, hash, supersedes |
|
||||
| `advisory.linkset.updated@1` | Correlation change | Deltas, conflicts |
|
||||
|
||||
### 12.2 Event Transport
|
||||
|
||||
- Primary: NATS
|
||||
- Fallback: Redis Stream
|
||||
- Offline Kit captures for replay
|
||||
|
||||
---
|
||||
|
||||
## 13. Observability
|
||||
|
||||
### 13.1 Metrics
|
||||
|
||||
- `concelier.fetch.docs_total{source}`
|
||||
- `concelier.fetch.bytes_total{source}`
|
||||
- `concelier.parse.failures_total{source}`
|
||||
- `concelier.observations.write_total{result}`
|
||||
- `concelier.linksets.updated_total{result}`
|
||||
- `concelier.linksets.conflicts_total{type}`
|
||||
- `concelier.export.duration_seconds{kind}`
|
||||
|
||||
### 13.2 Performance Targets
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| Ingest throughput | 5k docs/min |
|
||||
| Observation write | < 5ms p95 |
|
||||
| Linkset build | < 15ms p95 |
|
||||
| Export (1M advisories) | < 90 seconds |
|
||||
|
||||
---
|
||||
|
||||
## 14. Security Considerations
|
||||
|
||||
### 14.1 Outbound Security
|
||||
|
||||
- Allowlist per connector (domains, protocols)
|
||||
- Proxy support with TLS pinning
|
||||
- Rate limiting per source
|
||||
|
||||
### 14.2 Signature Verification
|
||||
|
||||
- PGP/cosign/x509 verification stored
|
||||
- Failed verification flagged, not rejected
|
||||
- Policy can down-weight unsigned sources
|
||||
|
||||
### 14.3 Determinism
|
||||
|
||||
- Canonical JSON writer
|
||||
- Stable export digests
|
||||
- Reproducible across runs
|
||||
|
||||
---
|
||||
|
||||
## 15. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Concelier architecture | `docs/modules/concelier/architecture.md` |
|
||||
| Link-Not-Merge schema | `docs/modules/concelier/link-not-merge-schema.md` |
|
||||
| Event schemas | `docs/modules/concelier/events/` |
|
||||
| Attestation guide | `docs/modules/concelier/attestation.md` |
|
||||
|
||||
---
|
||||
|
||||
## 16. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0115_0001_0004_concelier_iv.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0113_0001_0002_concelier_ii.md
|
||||
- SPRINT_0114_0001_0003_concelier_iii.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `CONCELIER-AOC-40-001` - AOC enforcement (DONE)
|
||||
- `CONCELIER-LNM-41-001` - Link-Not-Merge (DONE)
|
||||
- `CONCELIER-CONN-50-001` - Alpine connector (IN PROGRESS)
|
||||
- `CONCELIER-KEV-51-001` - KEV enrichment (TODO)
|
||||
- `CONCELIER-EXPORT-55-001` - Delta exports (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 17. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Advisory freshness | < 1 hour from source |
|
||||
| Ingestion accuracy | 100% provenance retention |
|
||||
| Export determinism | 100% hash reproducibility |
|
||||
| Conflict detection | 100% of source divergence |
|
||||
| Source coverage | 20+ authoritative sources |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Reference in New Issue
Block a user